Monday, November 7, 2011

New demos

Today, we just released a new demonstration which is showing a semantic search using books.

The idea is that the user provides some kind of information and this is used to match against book descriptions. It shows the power of semantic matching over keyword-based systems.

Two different types of information can be provided: the first is a descriptive paragraph. My example here is the classic line, "The quick brown fox jumps over the lazy dog". Results are in the next screenshot below and all are related in some way to 'fox'.

How does this improve over keyword searches? Well the main thing is that Roistr looks for the gist behind a document. It neatly deals with synonyms and has word-sense disambiguation built in to try and focus on the most meaningful match. It seeks out affinity beyond the purely lexical. The books choices are also intriguing. They're less like a normal library topic search and more like a knowledgeable librarian's recommendations.

We haven't yet tested this with users yet but I get the feeling that the accuracy is higher than a keyword search when used with real-world queries. I'm personally quite pleased with it.

The other way is to enter someone's Twitter name - it can be anyone's, not just your own - and the 20 most recent Tweets from that Twitter name are taken and used instead of the paragraph we just talked about.

The results show the book descriptions that are most semantically similar to the Tweets. Now this may or may not be related to the person's real interests; but if people Tweet about something, it's more likely that they're interested in it.

Here are the results using the Twitter name:

The usual provisos matter here: this is an early demo (the designer in me is going nuts!) and alpha but it does seem to work reliably. I had the engine process almost 200k documents on the weekend without a problem so it seems to be reasonably reliable.

In other words, all this is early work but very promising. The engine's quite stable and simple - much like a Unix tool, it does one job but does it very well - and can fit into a number of frameworks easily. The testament to this is that the code to make this run was written in a few hours (including the web interface which was the most time-consuming part).

Friday, November 4, 2011

How reliable is Roistr?

One question asked of us is how reliable Roistr is under stress. It works fine with the limited demonstrations, for example, but can it handle something closer to real-world work?

Lately, we've been preparing more detailed demonstrations. They both involve linking someone's social media to a list of a) book reviews and b) movie reviews (note: the latter is just for internal use right now). Although limited in scope, the data sets are more like real world data sets: the book reviews cover over 14,000 books; the movies over 150,000. So far, the engine has been working solidly and has produced results without a single blip on it.

The problem we do have is that it's not so fast in producing results - the 14,000+ books took 30 minutes; but calculating each vector is not a trivial operation: retrieving each vector takes many millions of floating point operations. We would like to reduce the time taken for this but this is where Roistr offers real value over existing methods: things like keyword matches are quicker but they aren't as good. Mimicking human performance takes a lot of effort.

We're happier to put up with slower results because they're more accurate and the important thing is for relevance to be maximised.

If you would like a demonstration or field test of Roistr, talk to me, Alan Salmoni (email link) and I'll see what we can organise for you.