All-pairs comparisons for dense matrices
Some of my recent posts described my woes in trying to figure out how I could quickly identify the most likely subset of vectors that match a candidate vector without repeatedly calculating cosine similarity.
I have found a method that appears to work! I won't describe it here because it needs more testing. Falsifications can and should be reported soon, but confirmations should be announced cautiously!
But if it works, I can reduce a set of vectors down to a fraction using fewer calculations. It's not ideal, but effective indexing and searching in a database should, in theory, substantially improve performance to the point where we can easily identify the best matching subset with relatively little ease.
Making a new search engineSo why is this being done? Why do I want to be able to take a mulit-billion element index of vectors and quickly and cheaply identify a tiny (say 100-item) subset before conducting the cosine comparisons?
Well, that would be giving it away but the subheading is fairly clear.
Besides, we're figuring out a way to store a 3+ billion page search engine on a terabyte of space. Yes, it's possible and yes, we're going to try it.
Once we've sorted out a few more problems!