Validate (X)HTML in Python
Solution 1:
PyTidyLib is a nice python binding for HTML Tidy. Their example:
from tidylib import tidy_document
document, errors = tidy_document('''<p>fõo <img src="bar.jpg">''',
options={'numeric-entities':1})
print document
print errors
Moreover it's compatible with both legacy HTML Tidy and the new tidy-html5.
Solution 2:
XHTML is easy, use lxml.
from lxml import etree
from StringIO import StringIO
etree.parse(StringIO(html), etree.HTMLParser(recover=False))
HTML is harder, since there's traditionally not been as much interest in validation among the HTML crowd (run StackOverflow itself through a validator, yikes). The easiest solution would be to execute external applications such as nsgmls or OpenJade, and then parse their output.