Utility to LOGICALLY compare two xml files?

Right now we are attempting to build gold configurations for our environment. One piece of software that we use relies on large XML files to contain the bulk of its configuration. We want to take our lab environment, catalog it as our "gold configuration" and then be able to audit against that configuration in the future.

Since diff is a bytewise comparison and NOT a logical comparison, we can't use it to compare files in this case (XML is unordered, so it won't work). What I am looking for is something that can parse the two XML files, and compare them element by element. So far we have yet to find any utilities that can do this. OS doesn't matter, I can do it on anything where it will work. The preference is something off the shelf.

Any ideas?

Edit: One issue we have run into is one vendor's config files will occasionally mention the same element several times, each time with different attributes. Whatever diff utility we use would need to be able to identify either the set of attributes or identify them all as part of one element. Tall order :)


Solution 1:

Two approaches that I use are (a) to canonicalize both XML files and then compare their serializations, and (b) to use the XPath 2.0 deep-equal() function. Both approaches are OK for telling you whether the files are the same, but not very good at telling you where they differ.

A commercial tool that specializes in this problem is DeltaXML.

If you have things that you consider equivalent, but which aren't equivalent at the XML level - for example, elements in a different order - then you may have to be prepared to do a transformation to normalize the documents before comparison.

Solution 2:

Good answer here:

Question: How can I diff two XML files? | Super User

Answer: How can I diff two XML files? | Super User

$ xmllint --format --exc-c14n one.xml > 1.xml
$ xmllint --format --exc-c14n two.xml > 2.xml
$ diff 1.xml 2.xml

Apologies for any failure to adhere to serverfault conventions ... I'm sure someone will let me know and I will amend appropriately.

Solution 3:

Python-based xmldiff looks like a very attractive solution; it claims to "extract differences between two xml files and to return a set of primitives to apply on source tree to obtain the destination tree."

Example:

a1.xml

<root>
</root>

a2.xml

<root attr="test1">
</root>

xmldiff a1.xml a2.xml:

[append-first, /,
<root attr="test1"/>
]
[remove, /root[2]]

Solution 4:

I wrote a simple python tool for this called xmldiffs:

Compare two XML files, ignoring element and attribute order.

Usage: xmldiffs [OPTION] FILE1 FILE2

Any extra options are passed to the diff command.

Get it at https://github.com/joh/xmldiffs