missing some text when iterating xml elements in python
Solution 1:
Try this:
from lxml import etree
tree = etree.fromstring("<foo> AAA <bar> BBB </bar> XXX </foo>")
foos = tree.xpath('//foo')
for foo in foos:
for j in foo.iter():
print j.tag, j.text, j.tail
Output:
foo AAA None
bar BBB XXX
The tail
attribute holds the text after the end tag of the element.
tail
is a peculiarity of lxml and ElementTree compared to other XML models, such as DOM. See http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/etree-view.html for more information.
Solution 2:
You also have to take
node.tail
into account (or check for it).