Natural Language Processing in Ruby [closed]
Three excellent and mature NLP packages are Stanford Core NLP, Open NLP and LingPipe. There are Ruby bindings to the Stanford Core NLP tools (GPL license) as well as the OpenNLP tools (Apache License).
On the more experimental side of things, I maintain a Text Retrieval, Extraction and Annotation Toolkit (Treat), released under the GPL, that provides a common API for almost every NLP-related gem that exists for Ruby. The following list of Treat's features can also serve as a good reference in terms of stable natural language processing gems compatible with Ruby 1.9.
- Text segmenters and tokenizers (
punkt-segmenter
,tactful_tokenizer
,srx-english
,scalpel
) - Natural language parsers for English, French and German and named entity extraction for English (
stanford-core-nlp
). - Word inflection and conjugation (
linguistics
), stemming (ruby-stemmer
,uea-stemmer
,lingua
, etc.) - WordNet interface (
rwordnet
), POS taggers (rbtagger
,engtagger
, etc.) - Language (
whatlanguage
), date/time (chronic
,kronic
,nickel
), keyword (lda-ruby
) extraction. - Text retrieval with indexation and full-text search (
ferret
). - Named entity extraction (
stanford-core-nlp
). - Basic machine learning with decision trees (
decisiontree
), MLPs (ruby-fann
), SVMs (rb-libsvm
) and linear classification (tomz-liblinear-ruby-swig
). - Text similarity metrics (
levenshtein-ffi
,fuzzy-string-match
,tf-idf-similarity
).
Not included in Treat, but relevant to NLP: hotwater (string distance algorithms), yomu (binders to Apache Tiki for reading .doc, .docx, .pages, .odt, .rtf, .pdf), graph-rank (an implementation of GraphRank).
There are some things at Ruby Linguistics and some links therefrom, though it doesn't seem anywhere close to what NLTK is for Python, yet.
You can always use jruby and use the java libraries.
EDIT: The ability to do ruby natively on the jvm and easily leverage java libraries is a big plus for rubyists. This is a good option that should be considered in a situation like this.
Which NLP toolkit to use in JAVA?