How to train the Stanford NLP Sentiment Analysis tool

What is the significance and difference between each file? Train.txt/Dev.txt/Test.txt ?

This is standard machine learning terminology. The train set is used to (surprise surprise) train a model. The development set is used to tune any parameters the model might have. What you would normally do is pick a parameter value, train a model on the training set, and then check how well the trained model does on the development set. You then pick another parameter value and repeat. This procedure helps you find reasonable parameter values for your model.

Once this is done, you proceed to test how well the model does on the test set. This is unseen- your model has never encountered any of that data before. It is important that the test set is separate from the training and development set, otherwise you are effectively evaluating a model on data it has seen before. This would be wrong as it will not give you an idea of how well the model really does.

How would I train my own model with a raw, unparsed text file full of tweets?

You can't and you shouldn't train using an unparsed set of documents. The entire point of the recursive deep model (and the reason it performs so well) is that it can learn from the sentiment annotations at every level of the parse tree. The sentence you have given above can be formatted like this:

(4 
    (4 
        (2 A) 
        (4 
            (3 (3 warm) (2 ,)) (3 funny)
        )
    ) 
    (3 
        (2 ,) 
        (3 
            (4 (4 engaging) (2 film)) (2 .)
        )
    )
)

Usually, a sentiment analyser is trained with document-level annotations. You only have one score, and this score applies to the document as a whole, ignoring the fact that the phrases in the document may express different sentiment. The Stanford team put a lot of effort into annotating every phrase in the document for sentiment. For example, the word film on its own is neutral in sentiment: (2 film). However, the phrase engaging film is very positive: (4 (4 engaging) (2 film)) (2 .)

If you have labelled tweets, you can use any other document-level sentiment classifier. The sentiment-analysis tag on stackoverflow already has some very good answers, I'm not going to repeat them here.

PS Did you label the tweets you have? All 1 million of them? If you did, I'd like to pay you a lot of money for that file :)


The Java code:

BuildBinarizedDataset -> [http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/sentiment/BuildBinarizedDataset.html

SentimentTraining -> http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/sentiment/SentimentTraining.html

For those who code in C#, I converted the Java source into two code files which should make understanding this process a lot simpler.

https://arachnode.net/blogs/arachnode_net/archive/2015/09/03/buildbinarizeddataset-and-sentimenttraining-stanford-nlp.aspx


If it helps, I got the C# code from Arachnode working very easily - a tweak or two to get the right paths for models and so on, but it then works great. What was missing was something about the right format for the input files. It's in the Javadoc, but for reference, for BuildBinarizedDataset it's something like:

2 line of text here

0 another line of text 

1 yet another line of text

etc

Building that is pretty trivial, depending on what you're starting with (a database, Excel file, whatever)