Monday, November 7, 2011

New demos

Today, we just released a new demonstration which is showing a semantic search using books.


The idea is that the user provides some kind of information and this is used to match against book descriptions. It shows the power of semantic matching over keyword-based systems.




Two different types of information can be provided: the first is a descriptive paragraph. My example here is the classic line, "The quick brown fox jumps over the lazy dog". Results are in the next screenshot below and all are related in some way to 'fox'.




How does this improve over keyword searches? Well the main thing is that Roistr looks for the gist behind a document. It neatly deals with synonyms and has word-sense disambiguation built in to try and focus on the most meaningful match. It seeks out affinity beyond the purely lexical. The books choices are also intriguing. They're less like a normal library topic search and more like a knowledgeable librarian's recommendations.

We haven't yet tested this with users yet but I get the feeling that the accuracy is higher than a keyword search when used with real-world queries. I'm personally quite pleased with it.


The other way is to enter someone's Twitter name - it can be anyone's, not just your own - and the 20 most recent Tweets from that Twitter name are taken and used instead of the paragraph we just talked about.




The results show the book descriptions that are most semantically similar to the Tweets. Now this may or may not be related to the person's real interests; but if people Tweet about something, it's more likely that they're interested in it.

Here are the results using the Twitter name:




The usual provisos matter here: this is an early demo (the designer in me is going nuts!) and alpha but it does seem to work reliably. I had the engine process almost 200k documents on the weekend without a problem so it seems to be reasonably reliable.

In other words, all this is early work but very promising. The engine's quite stable and simple - much like a Unix tool, it does one job but does it very well - and can fit into a number of frameworks easily. The testament to this is that the code to make this run was written in a few hours (including the web interface which was the most time-consuming part).