Tuesday, December 27, 2011

Experiment is coming soon

We're planning a little experiment soon to test how well Roistr can match book descriptions with Tweets. We'll be getting people to rate each of ten books according to how interesting each book is to them personally.

Watch this space because we'll be asking for participants soon.

Thursday, December 22, 2011

Personalised Advertising Guided by Social Media

Summary

Current advertising selections are determined by a number of factors, often combined to produce content that is relevant to the user. Increased relevance results in increased conversions.

I'll discuss some of the ways in which products are targeted towards potential buyers with benefits and drawbacks. There's also some talk about the future directions of advertising.

How are particular groups targeted now?

1. No targeting
The most obvious answer is that in many cases they're not. Adverts are simply put up and hope to hit at some point. The benefit is that you'll reach most people but the drawback is that this method is that a lot of people who have no interest whatsoever will also be reached. Even with Internet communications being cheap, this fact can still result in wasted effort and money.

2. Demographics
Traditionally, marketers try to understand their customers (potential or current) by grouping them, primarily using demographics. This works on the assumption that members of each group have similar predictable characteristics (they have similar habits and buy similar things). In some ways, this is almost a type-theory; and these are not widely accepted in contemporary psychology. The benefits are that doing this does increase conversions over zero-targeting and groups can be readily identified once a customer's information has been provided. The drawback is that getting this information is hard - companies are often willing to pay a lot of money for this type of data about a market. The more specifically a product can be targeted, the greater the chance of conversion; but to target more specifically requires an increasing amount of information about potential buyers. This can be hard, particularly as a lot of the information might not be obvious or accessible in large scale survey type research.

3. Stated interests
Potential buyers might be asked to state what their preferences are. This can help filter out irrelevant advertising materials and focus on relevant ones thus leading to increased conversions. The drawbacks are that it can be hard to get this information and such information just not be true.

4. Purchase history
Another form of prediction by groups is by using prior purchases. The theory is that if two people buy product A and the first person also bought product B, then the second person is more likely to buy product B than someone who didn't buy A. The advantage is that it requires no demographic information about buyers, and that it increases conversions to a level above chance. The disadvantages are that it relies upon the assumption of types of people; the data analysis required is quite heavy (often with very large sets); the difficulty in getting the data in the first place; and that it doesn't account for transience - trends or influences that act upon large groups of people for short periods of time.

5. Contextual adverts
Google popularised the contextual advert. These relied upon analysing the text of whatever page or search query a person entered and produced semantically similar adverts in response. These are held to have high levels of conversions due to increased relevance but they fail in one aspect: they only take into account context but not the person behind the context. "We are what we search for" is not enough because many searches are for things that do not define us but rather meet temporary or one-off information needs.

6. Social network
Facebook recently announced its intention to use the social graph to target advertisements. The theory is that what your friends buy is what you will want to buy. This may work but may not and ignores the realities of social networking. Many people use social networks with families (do I really want to buy what my daughter or my father want?), for professional promotion (so a recruiter I've chatted with once bought a particular car. That will influence my decision not at all) or people we have non-permanent relationships with (I can 'friend' people I haven't seen since primary school. How do their purchasing decisions relate to mine?).

These methods offer a partial solution. If done well, they will increase conversions which makes it easy to be complacent, maybe even think that the problem is solved as well as it possibly can be. The most effective advertising and marketing units will use a range of methods that complement each other to provide as full a coverage as possible.

But if we explore newer techniques, we might find other ways to complement

Trade-offs
There appears to be a trade-off. To a better chance of a sale from an advert, it has to be targeted more specifically. But to be targeted more specifically requires data gathering and analysis a priori to advertising.

But these methods still don't address the person behind the advert. People are either unconsidered/treated as uniform members of a group (which is successful but could be greatly improved) or have to provide data about themselves up-front. The only exception is the contextual advert which takes no account of the person but rather their current information need.

Future methods
The holy grail of advertising is to produce a method that takes into account not just context but also the person behind it in a way that requires as little a prior data gathering and analysis as possible, preferably none. These data must also be honest: the possibility of potential buyers giving misleading information should be low. Finally, the data must be obtained with the permission of the potential buyer. Not having this permission could backfire and turn a potential customer into one who refuses to do business.

So, where?
This leads to the question: where can we get such information?

The information sources must:

  1. Be made publicly available with users' permission preferably express
  2. Be retrieved for low or effectively zero cost
  3. Be about a single person
  4. Give a description of a person at the personal level
  5. Give some indication of the person's current context
  6. Provide a degree of authenticity

The solution
One answer is social media. Facebook, Twitter, and GooglePlus all offer information that is (often) publicly available and highlights the concerns of interest to an individual at a personal and contextual level. If a matter was not relevant to someone, why would they write about it?

But this information is hard to analyse. There's no neat forms with precise Likert scales, no specifically expressed interests and the like. It's plain, natural text written to be understood by other humans. It needs more preparation than most companies can put into a single potential sale before it can be analysed. Methods such as human-performed content analysis can categorise propositions according to set criteria and this information is gold. Human methods, however, don't scale very well.

But there is hope. Methods within artificial intelligence, specifically natural language processing, can analyse such text within a representation or 'map' of language. From this, we can see how closely related two pieces of text are. Or, in other words, we can relate someone's Facebook posts to a range of product descriptions.

Using natural language processing techniques can help you understand how similar two pieces of text are: one being a person's social media posts and the other being a range of products or services. The assumption is that the more similar an advert is to someone's social media, the higher relevance it will have to that person. This means higher conversions and greater sales.

There are many methods within natural language processing to estimate relevance. Google themselves use a system that was (and may still be) reliant upon Wordnet, a formal ontology of words and how they relate to each other.

Text analysis is used to identify topics within text and sentiment analysis is used to understand general feelings or attitudes towards something.

At Roistr, we use a combination of methods to understand the underlying meanings of documents. These are used to gauge the proximity of two or more documents from which we infer relevance. If two pieces of text are close then they are similar in meaning.

Evidence
We're doing an experiment soon using Amazon's top ten best sellers and asking people with a Twitter account to rate the 10 books on a scale of how much the book interests them personally. We will then take each person's public Tweets and rate them using our semantic relevance engine. The two sets of results will be compared to each other: individual human judgements compared to our personalised advertising doing the same thing using the individual's tweets.

Hopefully, this will give us some numbers to help us see whether it's possible to predict the most relevant product from a person's tweets. The experiment will be released soon and we will publish the results both here and in a white paper that will be free to download.

Sunday, December 11, 2011

Using Tornado as a web framework at Roistr

At Roistr, one of our essential tools is the Tornado web server. We use this as part of the workflow (in conjunction with other tools) to generate web pages as a web framework. It's easy to use templates and this is how to include HTML from different files.

Each Tornado program has a main program that runs everything and each page has its own object that inherits from tornado.web.RequestHandler. This object has methods for get and post.

Roistr itself is composed of a number of template features (bits of HTML common to all pages) and unique code (HTML unique to a single page). Being able to use a file to save those common bits makes it easy to update across the site - change on file (say the menu) and all the pages change.

In Roistr, we store the header, the menu and the footer in separate files. If we change any of these files, all web pages are updated. This saves work, saves testing effort, and makes it easier to maintain the site, particularly when changes are needed.

But how to incorporate this into a web page?

It's simple. For an example, let's say the header is contained in a file called, header.html. We want to include this with a page's unique content. The header contains the opening tags of the page all the way up to (and including) the tag.

We go to the unique page (let's call it, uniques.html) and the first thing we enter (because the header comes first) is this:

{% include header.html %}

This instructs Tornado to take all the code from header.html and send it out as the page. After that line, we continue with the unique page's content.

So generally, a Roistr page consists of:

{% include header.html %}
{% include menu.html %}
blah... blah... blah...
{% include footer.html %}

And now if I want to change anything in the header, the menu, or the footer across the site, I can change one of the above files and the whole site changes.

If you wanted to set up your own CMS, you could easily do so just by setting dividing up your pages into modules and loading each module - just like existing CMS's do. It's a bit more work like that but it's not really that much compared to learning Joomla or Drupal from scratch. Plus it's probably easier to build a custom site that would require significant modification of the Joomla or Drupal code.

I guess it depends on what you want. Personally, I like to be in total control of the HTML/CSS code that gets churned out because a) I might radically change it, and b) I'm in control of updates. Of course, you have all the problems on your own plate but that's part of the choice.

Best of luck!

Natural Language Interfaces

Here at Roistr, we're working to push the boundaries of UX as hard as we can. We're just a small company but very motivated and passionate about ensuring the best possible UX for everyone.

Does our work have any effect on this desire? Well yes. The deeper part is that we're pushing hard to make the next generation of natural language interfaces. Our ultimate goal is to pass the Turing test and many very capable people have failed. It's good to have an ambition though :-)

So what role does the current incarnation of our semantic relevance engine have on natural language interfaces?

We can reduce the amount of effort it takes to get things done which is good UX in my book. For example, in one business use case we have for recruiters, we can storm ahead easily. I recently looked at one recruitment website and it took me 20 minutes to go through the sign-up forms before I got fed up and abandoned the process.

With Roistr, I just upload a resume or CV, and that can be used to match me to particular jobs. Plus it does a reasonable job of matching which is great news. Uploading a resume or even copying / pasting it is much quicker than having to type in my work details *yet again*.

Another thing Roistr can do is to understand what concepts lie at the heart of what people write. I've used it already in tests to elicit the core concept from a document and it works surprisingly well. This can be used for automatic summarisation, categorisation and a whole host of other applications.

We're planning a sitemap tool whereby we can use a content breakdown and reform it into an information architecture. In my own experience of card sorts, people organise content according to meaning (whether topic or function) and being able to access similarity of meaning means that we're able to associate similar things. From this, we can build an IA in just a few seconds even for large amounts of content that couldn't realistically be put into a card sort.

Another possibility is making the job of keeping content useful easier. We're considering writing an extension for SharePoint that organises content according to meaning, much like a human would. We can also identify possible duplicates which for large corporate intranet sites is very useful. A better UX is provided by providing content that is up-to-date, timely and relevant so Roistr can play a big role in making the world a more sensible place.

As I said, our ultimate aim is to have a machine that you can have a sensible conversation with; something that's not like talking to a socially-inappropriate amnesiac but more like a real person. The possibilities we have are endless...

Thursday, December 1, 2011

How to work this?

Our concept of Roistr is that it works as SaaS - software as a service - that works when requested by clients. Whenever a business needs to understand the meaning of a set of documents or perform a common operation (like finding which ones of a collection are the most similar to a standard), they can access our RESTful API and get results instantly.

But recently, Roistr was accepted for Microsoft's BizSpark programme. This is a nice bonus because it has links to lots of other companies so it's a good networking opportunity. It also provides access to a load of MS software and this got me thinking of whether we should offer the semantic relevance engine as a software product - something boxed or downloaded and used locally rather than purely online. It's certainly possible, assuming that anyone who runs it has a powerful enough machine (multi-core ideally, which isn't so rare these days).

If so, we'll need to work out how this can be done. Roistr's API is restful so it should integrate easily with anything else. A simple HTTP call is made and this can be done via any language worth its salt.

But is there an advantage to be made from offering a Java API? A C# or MS .Net API? We're unsure on this but we now have the tools to make a .Net API thanks to MS.