xpath: get nodes that do not have an X ancestor

I want all nodes of an xml document that are not descendants of nodes X.

(My actual problem is a little more complex, but I'm stuck with the "are not descendants" part right now).


Solution 1:

If you translate "are not descendants" to "have no ancestor", you get the expression //*[not(ancestor::X)]. This will return all nodes in a document, which are not descendants of nodes named "X".

Solution 2:

jarnbjo points out the intuitive way to do this, to use //*[not(ancestor::X)]. This has the very great merit that it will work irrespective of how your document is structured, and it's what you should use in most circumstances.

But if you have a very large document, it may be extremely inefficient. That's a really expensive query. It tells the XPath processor to visit every node in the document and examine its ancestor node for the presence of an element named X. While it's possible that the XPath processor is smart enough to know that it doesn't need to visit the descendants of X to evaluate that query, it's not likely.

If you have some information about where the X element is, and you're careful, you can write a more efficient query. For instance, if X is a child of the top-level element, and it has a lot of descendants, this will be much faster:

/* | /*/* | /*/*[not(name()='X')]//*

That finds the top-level element, all of its immediate children, and the descendants of any of its immediate children not named X. It won't examine any of X's descendants.

Similarly, if you know that X is close to the bottom of the tree, this query may be more efficient:

//*[not(ancestor::*[position() <= 3][X])]

because it won't examine the entire ancestor axis for each node it tests, just its last three elements. (Unless the XPath processor is dumb enough to examine every node on an axis when it's performing tests that use position(), which it might be.)

As I said, though, most of the time the simplest version's going to be the best, and most of the time it's what I'd use myself.