Extract XML Value in bash script [duplicate]
As Charles Duffey has stated, XML parsers are best parsed with a proper XML parsing tools. For one time job the following should work.
grep -oPm1 "(?<=<title>)[^<]+"
Test:
$ echo "$data"
<item>
<title>15:54:57 - George:</title>
<description>Diane DeConn? You saw Diane DeConn!</description>
</item>
<item>
<title>15:55:17 - Jerry:</title>
<description>Something huh?</description>
$ title=$(grep -oPm1 "(?<=<title>)[^<]+" <<< "$data")
$ echo "$title"
15:54:57 - George:
XMLStarlet or another XPath engine is the correct tool for this job.
For instance, with data.xml
containing the following:
<root>
<item>
<title>15:54:57 - George:</title>
<description>Diane DeConn? You saw Diane DeConn!</description>
</item>
<item>
<title>15:55:17 - Jerry:</title>
<description>Something huh?</description>
</item>
</root>
...you can extract only the first title with the following:
xmlstarlet sel -t -m '//title[1]' -v . -n <data.xml
Trying to use sed for this job is troublesome. For instance, the regex-based approaches won't work if the title has attributes; won't handle CDATA sections; won't correctly recognize namespace mappings; can't determine whether a portion of the XML documented is commented out; won't unescape attribute references (such as changing Brewster & Jobs
to Brewster & Jobs
), and so forth.