Validate (X)HTML in Python

Solution 1:

PyTidyLib is a nice python binding for HTML Tidy. Their example:

from tidylib import tidy_document
document, errors = tidy_document('''<p>f&otilde;o <img src="bar.jpg">''',
    options={'numeric-entities':1})
print document
print errors

Moreover it's compatible with both legacy HTML Tidy and the new tidy-html5.

Solution 2:

XHTML is easy, use lxml.

from lxml import etree
from StringIO import StringIO
etree.parse(StringIO(html), etree.HTMLParser(recover=False))

HTML is harder, since there's traditionally not been as much interest in validation among the HTML crowd (run StackOverflow itself through a validator, yikes). The easiest solution would be to execute external applications such as nsgmls or OpenJade, and then parse their output.