Wednesday, January 16, 2013

Prolog and NLP

Prolog has been a fun language to use. It's like how I thought programming computers would be before I ever programme. You tell it things (facts and rules) and then you'd ask it questions to find answers.

In natural language processing, Prolog used to be very common but much less so now with languages such as Python and Java taking a lot of attention.

But, of course, if the only tool you know is a hammer, then every problem is solved with a nail. What Prolog and other declarative languages can do is exceptional. I don't see it as the entire solution. At Roistr, we've always said that Python was our mainstay language that we use to create statistical models of text. Computational linguistics has fallen out of favour somewhat lately, but I wondered how to mix the two together.

As an introduction, Prolog can take statements and return true / yes (it's logically possible), or false / no (it's not). There are other returns but we can worry about those some other time.

The concern I have is that Prolog doesn't have the grey areas that naturally occur in real life. My thoughts were if we could create a Prolog engine and supply it with explicit facts and rules. Queries that rely solely upon these facts and rules would be dealt with as normal. However, it would have the ability to deal with implicit knowledge by consulting a semantic relevance engine.

So if it was give facts about the things that John owns but doesn't mention a car, the semantic relevance engine could step in and realise that 'car' is almost synonymous with the 'automobile' fact that does appear.

It could also handle unknown facts and infer rules from the semantic association the engine provides. Say it's provided with a fact about John that says he's in France. When asked if he's in Europe, it would search its explicit knowledge base which returns false, or rather more correctly unknown (because it's not explicitly stated or ruled that France is in Europe), but the engine would note the close association between the concepts, 'France' and 'Europe' and allow the program to infer a likelihood.

So let's work this into an example:

john(france).
?- john(europe).
Explicit false
Implicit 0.87 true

So the semantic relevance engine would be a back-up knowledge base to be used to provide factual knowledge and inference. It's a bit bizarre and odd to think of and needs a lot of work before the validity of this notion could be tested.