How to use Wordnet 3.1 with NLTK on Python?
Solution 1:
After a lot of searching and trial and error, I was able to use Wordnet 3.1 on NLTK (Python). I tweaked this gist to make it work. I am providing the details below.
I divided the code provided in the gist in 3 parts.
Part 1. download_extract.py
import os
nltkdata_wn = '/path/to/nltk_data/corpora/wordnet/'
wn31 = "http://wordnetcode.princeton.edu/wn3.1.dict.tar.gz"
if not os.path.exists(nltkdata_wn+'_3.0'):
os.mkdir(nltkdata_wn+'_3.0')
os.system('mv '+nltkdata_wn+"* "+nltkdata_wn+"_3.0/")
if not os.path.exists('wn3.1.dict.tar.gz'):
os.system('wget '+wn31)
os.system("tar zxf wn3.1.dict.tar.gz -C "+nltkdata_wn)
os.system("mv "+nltkdata_wn+"dict/* "+nltkdata_wn)
os.rmdir(nltkdata_wn + 'dict')
This is used to back up the existing Wordnet 3.0 folder from wordnet
to wordnet_3.0
, download the Wordnet 3.1 database, and put it in folder wordnet
. Since I am on a Windows system, I did this manually.
Part 2. create_lexnames.py
import os
nltkdata_wn = '/path/to/nltk_data/corpora/wordnet/'
dbfiles = nltkdata_wn+'dbfiles'
with open(nltkdata_wn+'lexnames', 'w') as fout:
for i,j in enumerate(sorted(os.listdir(dbfiles))):
pos = j.partition('.')[0]
if pos == "noun":
syncat = 1
elif pos == "verb":
syncat = 2
elif pos == "adj":
syncat = 3
elif pos == "adv":
syncat = 4
elif j == "cntlist":
syncat = "cntlist"
fout.write("\t".join([str(i).zfill(2),j,str(syncat)])+"\n")
This creates the required lexnames
file in the wordnet
folder.
Part 3. testing_wn31.py
from nltk.corpus import wordnet as wn
nltkdata_wn = '/path/to/nltk_data/corpora/wordnet/'
# Checking generated lexnames file.
for i, line in enumerate(open(nltkdata_wn + 'lexnames','r')):
index, lexname, _ = line.split()
##print line.split(), int(index), i
assert int(index) == i
# Testing wordnet function.
print(wn.synsets('dog'))
for i in wn.all_synsets():
print(i, i.pos(), i.definition())
This tested the generated lexname
file and also tested if the wordnet functions are working fine.
Once I am done with this procedure, I ran following code in python and found that it is actually running version 3.1
>>> from nltk.corpus import wordnet
>>> wordnet.get_version()
'3.1'
A Word of Caution
Once you replace the Wordnet 3.1 database, you'll notice that if you run the following code
>>> import nltk
>>> nltk.download()
in the download dialog box, you will see that under Corpora
tab, Wordnet
will be shown as out of date
, you should not try to update it as it will either replace the wordnet to version 3.0 or break it.