Thursday, August 22, 2013

Using Wordnet from Prolog – part one

Prolog has many uses in natural language processing, and here is an introduction to using Wordnet through Prolog with examples of common tasks.

I enjoy learning Prolog because it conforms to how I thought computers should be programmed before I'd actually programmed one. You'd specify a set of rules, and the computer came back with answers that conformed to those rules.

But Prolog is also handy at natural language processing (NLP) and one of these endeavours is the famous Wordnet being made available in this fascinating language. This is well cool. I've used Wordnet lots in Python but rarely in Prolog.

First, download the Prolog Wordnet effort here (documentation here). I tried using it in GProlog but got an "atom table full" error. SWI Prolog handled it well enough.

Once downloaded, open it up and open a terminal (if not already), and cd to the directory the Wordnet files were unzipped to. Then begin your Prolog and type

[wn_s].

Remember the following full-stop / period.

Then, searching for a word's synsets was easy:

s(ID, W_num, 'ease', Ss_type, Sense_num, Tag_count).

And I got back a range of synsets related to the word, ease.

ID = 101064148
W_num = 2
Ss_type = n
Sense_num = 5
Tag_count = 0
Which means the Wordnet ID for that sense is 101064148, it's the second synset (W_num = 2), this synset has 'ease' as a noun (Ss_type = n), there are a total of 5 senses for 'ease' (Sense_num = 5) and there are no tags (Tag_count = 0).

Press the semi-colon to cycle through them all.

This is so simple. And if you want a particular sense (e.g., the first synset of 'ease'), just specify it thus:

s(ID, 1, 'ease', As_type, Sense_num, Tag_count).

As_type mostly reports words as type 'n' which means 'noun'. It can also report verbs (v), adverbs (r) and adjectives and adjective satellites.

Glossy Wordnet

Now let's retrieve a 'gloss'. A gloss is a natural language sentence illustrating the use of the particular word-sense.

First we load the gloss module.

[wn_g].

Then, we can take a word-sense ID and retrieve the gloss:

g(101064148, Gloss).

Which shows:

Gloss = 'freedom from activity (work or strain or responsibility); "took his repose by the swimming pool"'.

In the next Prolog and Wordnet article, I'll go through some more advanced ways of using Wordnet with Prolog.