How to parse XML using shellscript? [duplicate]

Solution 1:

You could try xmllint

The xmllint program parses one or more XML files, specified on the command line as xmlfile. It prints various types of output, depending upon the options selected. It is useful for detecting errors both in XML code and in the XML parser itse

It allows you select elements in the XML doc by xpath, using the --pattern option.

On Mac OS X (Yosemite), it is installed by default.
On Ubuntu, if it is not already installed, you can run apt-get install libxml2-utils

Solution 2:

Here's a full working example.
If it's only extracting email addresses you could just do something like:
1) Suppose XML file spam.xml is like

<spam>
<victims>
  <victim>
    <name>The Pope</name>
    <email>[email protected]</email>
    <is_satan>0</is_satan>
  </victim>
  <victim>
    <name>George Bush</name>
    <email>[email protected]</email>
    <is_satan>1</is_satan>
  </victim>
  <victim>
    <name>George Bush Jr</name>
    <email>[email protected]</email>
    <is_satan>0</is_satan>
  </victim>
</victims>
</spam>

2) You can get the emails and process them with this short bash code:

#!/bin/bash
emails=($(grep -oP '(?<=email>)[^<]+' "/my_path/spam.xml"))

for i in ${!emails[*]}
do
  echo "$i" "${emails[$i]}"
  # instead of echo use the values to send emails, etc
done

Result of this example is:

0 [email protected]
1 [email protected]
2 [email protected]

Important note:
Don't use this for serious matters. This is OK for playing around, getting quick results, learning grep, etc. but you should definitely look for, learn and use an XML parser for production (see Micha's comment below).

Solution 3:

There's also xmlstarlet (which is available for Windows as well).

http://xmlstar.sourceforge.net/doc/xmlstarlet.txt

Solution 4:

I am surprised no one has mentioned xmlsh. The mission statement :

A command line shell for XML Based on the philosophy and design of the Unix Shells

xmlsh provides a familiar scripting environment, but specifically tailored for scripting xml processes.

A list of shell like commands are provided here.

I use the xed command a lot which is equivalent to sed for XML, and allows XPath based search and replaces.