Get nodes where child node contains an attribute

Suppose I have the following XML:

<book category="CLASSICS">
  <title lang="it">Purgatorio</title>
  <author>Dante Alighieri</author>
  <year>1308</year>
  <price>30.00</price>
</book>

<book category="CLASSICS">
  <title lang="it">Inferno</title>
  <author>Dante Alighieri</author>
  <year>1308</year>
  <price>30.00</price>
</book>

<book category="CHILDREN">
  <title lang="en">Harry Potter</title>
  <author>J K. Rowling</author>
  <year>2005</year>
  <price>29.99</price>
</book>

<book category="WEB">
  <title lang="en">XQuery Kick Start</title>
  <author>James McGovern</author>
  <author>Per Bothner</author>
  <author>Kurt Cagle</author>
  <author>James Linn</author>
  <author>Vaidyanathan Nagarajan</author>
  <year>2003</year>
  <price>49.99</price>
</book>

<book category="WEB">
  <title lang="en">Learning XML</title>
  <author>Erik T. Ray</author>
  <year>2003</year>
  <price>39.95</price>
</book>

I would like to do an xpath that gets back all book nodes that have a title node with a language attribute of "it".

My attempt looked something like this:

//book[title[@lang='it']]

But that didn't work. I expect to get back the nodes:

<book category="CLASSICS">
  <title lang="it">Purgatorio</title>
  <author>Dante Alighieri</author>
  <year>1308</year>
  <price>30.00</price>
</book>

<book category="CLASSICS">
  <title lang="it">Inferno</title>
  <author>Dante Alighieri</author>
  <year>1308</year>
  <price>30.00</price>
</book>

Any hints?


Solution 1:

Try

//book[title/@lang = 'it']

This reads:

  • get all book elements
    • that have at least one title
      • which has an attribute lang
        • with a value of "it"

You may find this helpful — it's an article entitled "XPath in Five Paragraphs" by Ronald Bourret.

But in all honesty, //book[title[@lang='it']] and the above should be equivalent, unless your XPath engine has "issues." So it could be something in the code or sample XML that you're not showing us -- for example, your sample is an XML fragment. Could it be that the root element has a namespace, and you aren't counting for that in your query? And you only told us that it didn't work, but you didn't tell us what results you did get.

Solution 2:

Years later, but a useful option would be to utilize XPath Axes (https://www.w3schools.com/xml/xpath_axes.asp). More specifically, you are looking to use the descendants axes.

I believe this example would do the trick:

//book[descendant::title[@lang='it']]

This allows you to select all book elements that contain a child title element (regardless of how deep it is nested) containing language attribute value equal to 'it'.

I cannot say for sure whether or not this answer is relevant to the year 2009 as I am not 100% certain that the XPath Axes existed at that time. What I can confirm is that they do exist today and I have found them to be extremely useful in XPath navigation and I am sure you will as well.