Automatic parsing of citation text in academic references

Solution 1:

At the moment (2017) the most active Open-Source project implementing this seem to be Anystyle Parser (last version 07-2016). It can be used through a web-interface, API, or downloaded as a RubyGem.

They explicitly mention on their website that the implementation is inspired by ParsCit (last version 2013?) and FreeCite (last commit 2009).

Also form their website:

AnyStyle Parser uses powerful machine learning heuristics based on Conditional Random Fields that can be trained by everyone using our built-in editor.

That is a realy cool feature, that makes this the most interesting implementation (imho). Training seems to be pretty straightforward, as explained in the API documentation. You just provide some manually corrected results, and and run the Anystyle.parser.train command. I am not sure if ParsCit and FreeCite also support this, but if they don't, this seems like a huge feature-difference to me.

Solution 2:

Take a look at this list of Citation Parsers that can generate XML from input text:

http://paracite.eprints.org
http://aye.comp.nus.edu.sg/parsCit (in maintenance mode as of Aug 1, 2012)
http://opcit.eprints.org
http://search.cpan.org/~mjewell/Biblio-Citation-Parser-1.10