How to parse XML using shellscript? [duplicate]
Solution 1:
You could try xmllint
The xmllint program parses one or more XML files, specified on the command line as xmlfile. It prints various types of output, depending upon the options selected. It is useful for detecting errors both in XML code and in the XML parser itse
It allows you select elements in the XML doc by xpath, using the --pattern option.
On Mac OS X (Yosemite), it is installed by default.
On Ubuntu, if it is not already installed, you can run apt-get install libxml2-utils
Solution 2:
Here's a full working example.
If it's only extracting email addresses you could just do something like:
1) Suppose XML file spam.xml is like
<spam>
<victims>
<victim>
<name>The Pope</name>
<email>[email protected]</email>
<is_satan>0</is_satan>
</victim>
<victim>
<name>George Bush</name>
<email>[email protected]</email>
<is_satan>1</is_satan>
</victim>
<victim>
<name>George Bush Jr</name>
<email>[email protected]</email>
<is_satan>0</is_satan>
</victim>
</victims>
</spam>
2) You can get the emails and process them with this short bash code:
#!/bin/bash
emails=($(grep -oP '(?<=email>)[^<]+' "/my_path/spam.xml"))
for i in ${!emails[*]}
do
echo "$i" "${emails[$i]}"
# instead of echo use the values to send emails, etc
done
Result of this example is:
0 [email protected]
1 [email protected]
2 [email protected]
Important note:
Don't use this for serious matters. This is OK for playing around, getting quick results, learning grep, etc. but you should definitely look for, learn and use an XML parser for production (see Micha's comment below).
Solution 3:
There's also xmlstarlet (which is available for Windows as well).
http://xmlstar.sourceforge.net/doc/xmlstarlet.txt
Solution 4:
I am surprised no one has mentioned xmlsh. The mission statement :
A command line shell for XML Based on the philosophy and design of the Unix Shells
xmlsh provides a familiar scripting environment, but specifically tailored for scripting xml processes.
A list of shell like commands are provided here.
I use the xed
command a lot which is equivalent to sed
for XML, and allows XPath
based search and replaces.