Unsupervised Sentiment Analysis

Solution 1:

A classic paper by Peter Turney (2002) explains a method to do unsupervised sentiment analysis (positive/negative classification) using only the words excellent and poor as a seed set. Turney uses the mutual information of other words with these two adjectives to achieve an accuracy of 74%.

Solution 2:

I haven't tried doing untrained sentiment analysis such as you are describing, but off the top of my head I'd say you're oversimplifying the problem. Simply analyzing adjectives is not enough to get a good grasp of the sentiment of a text; for example, consider the word 'stupid.' Alone, you would classify that as negative, but if a product review were to have '... [x] product makes their competitors look stupid for not thinking of this feature first...' then the sentiment in there would definitely be positive. The greater context in which words appear definitely matters in something like this. This is why an untrained bag-of-words approach alone (let alone an even more limited bag-of-adjectives) is not enough to tackle this problem adequately.

The pre-classified data ('training data') helps in that the problem shifts from trying to determine whether a text is of positive or negative sentiment from scratch, to trying to determine if the text is more similar to positive texts or negative texts, and classify it that way. The other big point is that textual analyses such as sentiment analysis are often affected greatly by the differences of the characteristics of texts depending on domain. This is why having a good set of data to train on (that is, accurate data from within the domain in which you are working, and is hopefully representative of the texts you are going to have to classify) is as important as building a good system to classify with.

Not exactly an article, but hope that helps.

Solution 3:

The paper of Turney (2002) mentioned by larsmans is a good basic one. In a newer research, Li and He [2009] introduce an approach using Latent Dirichlet Allocation (LDA) to train a model that can classify an article's overall sentiment and topic simultaneously in a totally unsupervised manner. The accuracy they achieve is 84.6%.

Solution 4:

I tried several methods of Sentiment Analysis for opinion mining in Reviews. What worked the best for me is the method described in Liu book: http://www.cs.uic.edu/~liub/WebMiningBook.html In this Book Liu and others, compared many strategies and discussed different papers on Sentiment Analysis and Opinion Mining.

Although my main goal was to extract features in the opinions, I implemented a sentiment classifier to detect positive and negative classification of this features.

I used NLTK for the pre-processing (Word tokenization, POS tagging) and the trigrams creation. Then also I used the Bayesian Classifiers inside this tookit to compare with other strategies Liu was pinpointing.

One of the methods relies on tagging as pos/neg every trigrram expressing this information, and using some classifier on this data. Other method I tried, and worked better (around 85% accuracy in my dataset), was calculating the sum of scores of PMI (punctual mutual information) for every word in the sentence and the words excellent/poor as seeds of pos/neg class.