How would you compare two XML Documents?

As part of the base class for some extensive unit testing, I am writing a helper function which recursively compares the nodes of one XmlDocument object to another in C# (.NET). Some requirements of this:

  • The first document is the source, e.g. what I want the XML document to look like. Thus the second is the one I want to find differences in and it must not contain extra nodes not in the first document.
  • Must throw an exception when too many significant differences are found, and it should be easily understood by a human glancing at the description.
  • Child element order is important, attributes can be in any order.
  • Some attributes are ignorable; specifically xsi:schemaLocation and xmlns:xsi, though I would like to be able to pass in which ones are.
  • Prefixes for namespaces must match in both attributes and elements.
  • Whitespace between elements is irrelevant.
  • Elements will either have child elements or InnerText, but not both.

While I'm scrapping something together: has anyone written such code and would it be possible to share it here?

On an aside, what would you call the first and second documents? I've been referring to them as "source" and "target", but it feels wrong since the source is what I want the target to look like, else I throw an exception.


Solution 1:

Microsoft has an XML diff API that you can use.

Unofficial NuGet: https://www.nuget.org/packages/XMLDiffPatch.

Solution 2:

I googled up a more complete list of solutions of this problem today, I am going to try one of them soon:

  • http://xmlunit.sourceforge.net/
  • http://msdn.microsoft.com/en-us/library/aa302294.aspx
  • http://jolt.codeplex.com/wikipage?title=Jolt.Testing.Assertions.XML.Adaptors
  • http://www.codethinked.com/checking-xml-for-semantic-equivalence-in-c
  • https://vkreynin.wordpress.com/tag/xml/
  • http://gandrusz.blogspot.com/2008/07/recently-i-have-run-into-usual-problem.html
  • http://xmlspecificationcompare.codeplex.com/
  • https://github.com/netbike/netbike.xmlunit

Solution 3:

This code doesn't satisfy all your requirements, but it's simple and I'm using for my unit tests. Attribute order doesn't matter, but element order does. Element inner text is not compared. I also ignored case when comparing attributes, but you can easily remove that.

public bool XMLCompare(XElement primary, XElement secondary)
{
    if (primary.HasAttributes) {
        if (primary.Attributes().Count() != secondary.Attributes().Count())
            return false;
        foreach (XAttribute attr in primary.Attributes()) {
            if (secondary.Attribute(attr.Name.LocalName) == null)
                return false;
            if (attr.Value.ToLower() != secondary.Attribute(attr.Name.LocalName).Value.ToLower())
                return false;
        }
    }
    if (primary.HasElements) {
        if (primary.Elements().Count() != secondary.Elements().Count())
            return false;
        for (var i = 0; i <= primary.Elements().Count() - 1; i++) {
            if (XMLCompare(primary.Elements().Skip(i).Take(1).Single(), secondary.Elements().Skip(i).Take(1).Single()) == false)
                return false;
        }
    }
    return true;
}