Is there a software that performs a textual analysis on blogs? [closed]

My company is looking to create a PivotViewer visualization of a client's Wordpress 2 blog posts for the last 11 years. To do so, however, we need to edit the somewhat haphazard, incomplete, and generally poor tags for use as sortable categories. I'm looking for a tool that will analyze their blog entries and perform word counting, to give us a sense of what we're dealing with.

Ideally, it would have all of these features:

  1. Word blacklisting (ignore)
  2. Word stemming
  3. Custom synonym merging
  4. Counting all uses
  5. Counting number of posts a word appears in.

I would have thought that this sort of textual analysis would be extremely common, but I haven't been able to find any software that does this sort of thing on entire blogs. Is there software available to do this?


Solution 1:

The software you are looking for can have many titles, like "Content analysis", "Tag cloud" or "Meta Tags" and many more such as "text analysis" and "text mining".

There are very many software tools for these purposes, both free and commercial.

I do not have personal experience with such tools, but a good place to start is Text Analysis Tools that lists dozens of such tools, both free and commercial.

Another such list is Text Analysis, Text Mining, and Information Retrieval Software.

Solution 2:

Take a look at Rapidminer or Weka

Seeing as its a clients blog, you probably have database access. Download all articles as plaintext and use one of the above programs to deal with the natural language processing questions (1,2,3, and 5).

The number of uses is hard to truly automate since it has to do with automatically determining the meaning of words using the context.

Solution 3:

one of the most content analysis software is WordStat designed by Provalis Research

WordStat is a text analysis module for QDA Miner or SimStat. WordStat combines content analysis method by using dictionary approach and many algorithms exploration or various text mining methods. WordStat can apply existing categorization dictionaries to a new text corpus. It also may be used in the development and validation of new categorization dictionaries. When used in conjunction with manual coding, this module can provide assistance for a more systematic application of coding rules, help uncover differences in word usage between subgroups of individuals and assist in the revision of existing coding using KWIC (Keyword In Context) tables. WordStat is specifically designed to study textual information such as responses to open-ended questions, interviews, titles, journal articles, public speeches, electronic communications, etc.

http://provalisresearch.com/products/content-analysis-software/