Equivalent to InnerHTML when using lxml.html to parse HTML
Solution 1:
Sorry for bringing this up again, but I've been looking for a solution and yours contains a bug:
<body>This text is ignored
<h1>Title</h1><p>Some text</p></body>
Text directly under the root element is ignored. I ended up doing this:
(body.text or '') +\
''.join([html.tostring(child) for child in body.iterchildren()])
Solution 2:
You can get the children of an ElementTree node using the getchildren() or iterdescendants() methods of the root node:
>>> from lxml import etree
>>> from cStringIO import StringIO
>>> t = etree.parse(StringIO("""<body>
... <h1>A title</h1>
... <p>Some text</p>
... </body>"""))
>>> root = t.getroot()
>>> for child in root.iterdescendants(),:
... print etree.tostring(child)
...
<h1>A title</h1>
<p>Some text</p>
This can be shorthanded as follows:
print ''.join([etree.tostring(child) for child in root.iterdescendants()])