NLTK Named Entity Recognition with Custom Data

Solution 1:

Are you committed to using NLTK/Python? I ran into the same problems as you, and had much better results using Stanford's named-entity recognizer: http://nlp.stanford.edu/software/CRF-NER.shtml. The process for training the classifier using your own data is very well-documented in the FAQ.

If you really need to use NLTK, I'd hit up the mailing list for some advice from other users: http://groups.google.com/group/nltk-users.

Hope this helps!

Solution 2:

You can easily use the Stanford NER alongwith nltk. The python script is like

from nltk.tag.stanford import NERTagger
import os
java_path = "/Java/jdk1.8.0_45/bin/java.exe"
os.environ['JAVAHOME'] = java_path
st = NERTagger('../ner-model.ser.gz','../stanford-ner.jar')
tagging = st.tag(text.split())

To train your own data and to create a model you can refer to the first question on Stanford NER FAQ.

The link is http://nlp.stanford.edu/software/crf-faq.shtml

Solution 3:

I also had this issue, but I managed to work it out. You can use your own training data. I documented the main requirements/steps for this in my github repository.

I used NLTK-trainer, so basicly you have to get the training data in the right format (token NNP B-tag), and run the training script. Check my repository for more info.

Would you ever implement a linked list in Javascript?

Is there any JSON viewer to open large json files (windows)? [closed]

Examples of the best SOAP/REST/RPC web APIs? And why do you like them? And what's wrong with them? [closed]

NInject: Where do you keep your reference to the Kernel?

Threading in GWT (Client)

"cannot implement interface member" error when interface and concrete are in different projects

Namespaces in Redis?

how to make pull requests *without* a github account?

Is this a compiler optimisation bug, or an undefined behaviour?

Is it really my job to clean up ThreadLocal resources when classes have been exposed to a thread pool?

org.apache.catalina.LifecycleException: Failed to start component [StandardEngine[Catalina].StandardHost[localhost].StandardContext[/CollegeWebsite]] [duplicate]

When will ActiveRecord save associations?