missing some text when iterating xml elements in python

Solution 1:

Try this:

from lxml import etree

tree = etree.fromstring("<foo> AAA <bar> BBB </bar> XXX </foo>")
foos = tree.xpath('//foo')

for foo in foos:
    for j in foo.iter():
        print j.tag, j.text, j.tail

Output:

foo  AAA  None
bar  BBB   XXX 

The tail attribute holds the text after the end tag of the element.

tail is a peculiarity of lxml and ElementTree compared to other XML models, such as DOM. See http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/etree-view.html for more information.

Solution 2:

You also have to take

node.tail

into account (or check for it).