XPath with regex match on an attribute value
All -
I've searched and tinkered around for hours in an effort to crack this one, but I'm still having problems. I have the XML data below:
<game id="2009/05/02/arimlb-milmlb-1" pk="244539">
<team id="109" name="Arizona" home_team="false">
<event number="9" inning="1" description="Felipe Lopez doubles to left fielder Chris Duffy. "/>
<event number="15" inning="1" description="Augie Ojeda flies out to center fielder Mike Cameron. "/>
<event number="23" inning="1" description="Chad Tracy doubles to right fielder Joe Sanchez. "/>
<event number="52" inning="2" description="Mark Reynolds lines out to left fielder Chris Duffy. "/>
<!-- more data here -->
</team>
</game>
I'm trying to get the total number of event nodes that contain the text ' doubles ' in the value of the description attribute. This is what I've been trying so far, to no avail (irb throws an error):
"/game/team/event/@description[matches(.,' doubles ')]"
Since I'm just trying to match a fragment of the value of the description attribute, it's possible to use the XPath 2.0 function 'matches', right? If so, what am I doing wrong?
Thanks in advance for any help!
I'm trying to get the total number of event nodes that contain the text ' doubles ' in the value of the description attribute.
matches()
is a standard XPath 2.0 function. It is not available in XPath 1.0.
You can use:
count(/*/*/event[contains(@description, ' doubles ')])
To verify this, here is a small XSLT transformation which just outputs the result of evaluating the above XPath expression on the provided XML document:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of select=
"count(/*/*/event[contains(@description, ' doubles ')])"/>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the provided XML document:
<game id="2009/05/02/arimlb-milmlb-1" pk="244539">
<team id="109" name="Arizona" home_team="false">
<event number="9" inning="1" description="Felipe Lopez doubles to left fielder Chris Duffy. "/>
<event number="15" inning="1" description="Augie Ojeda flies out to center fielder Mike Cameron. "/>
<event number="23" inning="1" description="Chad Tracy doubles to right fielder Joe Sanchez. "/>
<event number="52" inning="2" description="Mark Reynolds lines out to left fielder Chris Duffy. "/>
<!-- more data here -->
</team>
</game>
the wanted, correct result is produced:
2
Try the following variants:
/game/team/event[matches(@description, ' doubles ')]/@description
/game/team/event[matches(@description, '^.*?doubles.*$')]/@description
/game/team/event[contains(@description, ' doubles ')]/@description
Since I'm just trying to match a fragment of the value of the description attribute, it's possible to use the XPath 2.0 function 'matches', right?
Yes, as long as you are using an XPath 2.0 engine to evaluate the XPath expression.
If you were to execute that XPath using an XPath 2.0 engine, it would select the appropriate @description
attributes.
If so, what am I doing wrong?
If you are using an XPath 2.0 engine, your issue may be that you have selected a sequence of nodes, but are expecting the count.
If you want to return the count of those attributes, you could use the count()
function:
count(/game/team/event/@description[matches(.,' doubles ')])