Order of XML attributes after DOM processing

When processing XML by means of standard DOM, attribute order is not guaranteed after you serialize back. At last that is what I just realized when using standard java XML Transform API to serialize the output.

However I do need to keep an order. I would like to know if there is any posibility on Java to keep the original order of attributes of an XML file processed by means of DOM API, or any way to force the order (maybe by using an alternative serialization API that lets you set this kind of property). In my case processing reduces to alter the value of some attributes (not all) of a sequence of the same elements with a bunch of attributes, and maybe insert a few more elements.

Is there any "easy" way or do I have to define my own XSLT transformation stylesheet to specify the output and altering the whole input XML file?

Update I must thank all your answers. The answer seems now more obvious than I expected. I never paid any attention to attribute order, since I had never needed it before.

The main reason to require an attribute order is that the resulting XML file just looks different. The target is a configuration file that holds hundreds of alarms (every alarm is defined by a set of attributes). This file usually has little modifications over time, but it is convenient to keep it ordered, since when we need to modify something it is edited by hand. Now and then some projects need light modifications of this file, such as setting one of the attributes to a customer specific code.

I just developed a little application to merge original file (common to all projects) with specific parts of each project (modify the value of some attributes), so project-specific file gets the updates of the base one (new alarm definitions or some attribute values bugfixes). My main motivation to require ordered attributes is to be able to check the output of the application againts the original file by means of a text comparation tool (such as Winmerge). If the format (mainly attribute order) remains the same, the differences can be easily spotted.

I really thought this was possible, since XML handling programs, such as XML Spy, lets you edit XML files and apply some ordering (grid mode). Maybe my only choice is to use one of these programs to manually modify the output file.


Solution 1:

Sorry to say, but the answer is more subtle than "No you can't" or "Why do you need to do this in the first place ?".

The short answer is "DOM will not allow you to do that, but SAX will".

This is because DOM does not care about the attribute order, since it's meaningless as far as the standard is concerned, and by the time the XSL gets hold of the input stream, the info is already lost. Most XSL engine will actually gracefully preserve the input stream attribute order (e.g. Xalan-C (except in one case) or Xalan-J (always)). Especially if you use <xsl:copy*>.

Cases where the attribute order is not kept, best of my knowledge, are. - If the input stream is a DOM - Xalan-C: if you insert your result-tree tags literally (e.g. <elem att1={@att1} .../>

Here is one example with SAX, for the record (inhibiting DTD nagging as well).

SAXParserFactory spf = SAXParserFactoryImpl.newInstance();
spf.setNamespaceAware(true);
spf.setValidating(false);
spf.setFeature("http://xml.org/sax/features/validation", false);
spf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
spf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
SAXParser sp = spf.newSAXParser() ;
Source src = new SAXSource ( sp.getXMLReader(), new InputSource( input.getAbsolutePath() ) ) ;
String resultFileName = input.getAbsolutePath().replaceAll(".xml$", ".cooked.xml" ) ;
Result result = new StreamResult( new File (resultFileName) ) ;
TransformerFactory tf = TransformerFactory.newInstance();
Source xsltSource = new StreamSource( new File ( COOKER_XSL ) );
xsl = tf.newTransformer( xsltSource ) ;
xsl.setParameter( "srcDocumentName", input.getName() ) ;
xsl.setParameter( "srcDocumentPath", input.getAbsolutePath() ) ;

xsl.transform(src, result );

I'd also like to point out, at the intention of many naysayers that there are cases where attribute order does matter.

Regression testing is an obvious case. Whoever has been called to optimise not-so-well written XSL knows that you usually want to make sure that "new" result trees are similar or identical to the "old" ones. And when the result tree are around one million lines, XML diff tools prove too unwieldy... In these cases, preserving attribute order is of great help.

Hope this helps ;-)

Solution 2:

Look at section 3.1 of the XML recommendation. It says, "Note that the order of attribute specifications in a start-tag or empty-element tag is not significant."

If a piece of software requires attributes on an XML element to appear in a specific order, that software is not processing XML, it's processing text that looks superficially like XML. It needs to be fixed.

If it can't be fixed, and you have to produce files that conform to its requirements, you can't reliably use standard XML tools to produce those files. For instance, you might try (as you suggest) to use XSLT to produce attributes in a defined order, e.g.:

<test>
   <xsl:attribute name="foo"/>
   <xsl:attribute name="bar"/>
   <xsl:attribute name="baz"/>
</test>

only to find that the XSLT processor emits this:

<test bar="" baz="" foo=""/>

because the DOM that the processor is using orders attributes alphabetically by tag name. (That's common but not universal behavior among XML DOMs.)

But I want to emphasize something. If a piece of software violates the XML recommendation in one respect, it probably violates it in other respects. If it breaks when you feed it attributes in the wrong order, it probably also breaks if you delimit attributes with single quotes, or if the attribute values contain character entities, or any of a dozen other things that the XML recommendation says that an XML document can do that the author of this software probably didn't think about.

Solution 3:

XML Canonicalisation results in a consistent attribute ordering, primarily to allow one to check a signature over some or all of the XML, though there are other potential uses. This may suit your purposes.

Solution 4:

It's not possible to over-emphasize what Robert Rossney just said, but I'll try. ;-)

The benefit of International Standards is that, when everybody follows them, life is good. All our software gets along peacefully.

XML has to be one of the most important standards we have. It's the basis of "old web" stuff like SOAP, and still 'web 2.0' stuff like RSS and Atom. It's because of clear standards that XML is able to interoperate between different platforms.

If we give up on XML, little by little, we'll get into a situation where a producer of XML will not be able to assume that a consumer of XML will be able to consumer their content. This would have a disasterous affect on the industry.

We should push back very forcefully, on anyone who writes code that does not process XML according to the standard. I understand that, in these economic times, there is a reluctance to offend customers and business partners by saying "no". But in this case, I think it's worth it. We would be in much worse financial shape if we had to hand-craft XML for each business partner.

So, don't "enable" companies who do not understand XML. Send them the standard, with the appropriate lines highlighted. They need to stop thinking that XML is just text with angle brackets in it. It simply does not behave like text with angle brackets in it.

It's not like there's an excuse for this. Even the smallest embedded devices can have full-featured XML parser implementations in them. I have not yet heard a good reason for not being able to parse standard XML, even if one can't afford a fully-featured DOM implementation.