python - lxml: enforcing a specific order for attributes

Solution 1:

It looks like lxml serializes attributes in the order you set them:

>>> from lxml import etree as ET
>>> x = ET.Element("x")
>>> x.set('a', '1')
>>> x.set('b', '2')
>>> ET.tostring(x)
'<x a="1" b="2"/>'
>>> y= ET.Element("y")
>>> y.set('b', '2')
>>> y.set('a', '1')
>>> ET.tostring(y)
'<y b="2" a="1"/>'

Note that when you pass attributes using the ET.SubElement() constructor, Python constructs a dictionary of keyword arguments and passes that dictionary to lxml. This loses any ordering you had in the source file, since Python's dictionaries are unordered (or, rather, their order is determined by string hash values, which may differ from platform to platform or, in fact, from execution to execution).

Solution 2:

OrderedDict of attributes

As of lxml 3.3.3 (perhaps also in earlier versions) you can pass an OrderedDict of attributes to the lxml.etree.(Sub)Element constructor and the order will be preserved when using lxml.etree.tostring(root):

sig.fileformat = etree.SubElement(sig.fileformats, "FileFormat", OrderedDict([("ID",str(db.ID)), ("Name",db.name), ("PUID","fileSig/{}".format(str(db.ID))), ("Version",""), ("MIMEType","")]))

Note that the ElementTree API (xml.etree.ElementTree) does not preserve attribute order even if you provide an OrderedDict to the xml.etree.ElementTree.(Sub)Element constructor!

UPDATE: Also note that using the **extra parameter of the lxml.etree.(Sub)Element constructor for specifying attributes does not preserve attribute order:

>>> from lxml.etree import Element, tostring
>>> from collections import OrderedDict
>>> root = Element("root", OrderedDict([("b","1"),("a","2")])) # attrib parameter
>>> tostring(root)
b'<root b="1" a="2"/>' # preserved
>>> root = Element("root", b="1", a="2") # **extra parameter
>>> tostring(root)
b'<root a="2" b="1"/>' # not preserved

Solution 3:

Attribute ordering and readability As the commenters have mentioned, attribute order has no semantic significance in XML, which is to say it doesn't change the meaning of an element:

<tag attr1="val1" attr2="val2"/>

<!-- means the same thing as: -->

<tag attr2="val2" attr1="val1"/>

There is an analogous characteristic in SQL, where column order doesn't change the meaning of a table definition. XML attributes and SQL columns are a set (not an ordered set), and so all that can "officially" be said about either one of those is whether the attribute or column is present in the set.

That said, it definitely makes a difference to human readability which order these things appear in and in situations where constructs like this are authored and appear in text (e.g. source code) and must be interpreted, a careful ordering makes a lot of sense to me.

Typical parser behavior

Any XML parser that treated attribute order as significant would be out of compliance with the XML standard. That doesn't mean it can't happen, but in my experience it is certainly unusual. Still, depending on the provenence of the tool you mention, it's a possibility that may be worth testing.

As far as I know, lxml has no mechanism for specifying the order attributes appear in serialized XML, and I would be surprised if it did.

In order to test the behavior I'd be strongly inclined to just write a text-based template to generate enough XML to test it out:

id = 1
name = 'Development Signature'
puid = 'dev/1'
version = '1.0'
mimetype = 'text/x-test-signature'

template = ('<FileFormat ID="%d" Name="%s" PUID="%s" Version="%s" '
            'MIMEType="%s">')

xml = template % (id, name, puid, version, mimetype)