Grep and Sed Equivalent for XML Command Line Processing
Solution 1:
I've found xmlstarlet to be pretty good at this sort of thing.
http://xmlstar.sourceforge.net/
Should be available in most distro repositories, too. An introductory tutorial is here:
http://www.ibm.com/developerworks/library/x-starlet.html
Solution 2:
Some promising tools:
nokogiri: parsing HTML/XML DOMs in ruby using XPath & CSS selectors
hpricot: deprecated
fxgrep: Uses its own XPath-like syntax to query documents. Written in SML, so installation may be difficult.
LT XML: XML toolkit derived from SGML tools, including
sggrep
,sgsort
,xmlnorm
and others. Uses its own query syntax. The documentation is very formal. Written in C. LT XML 2 claims support of XPath, XInclude and other W3C standards.xmlgrep2: simple and powerful searching with XPath. Written in Perl using XML::LibXML and libxml2.
XQSharp: Supports XQuery, the extension to XPath. Written for the .NET Framework.
xml-coreutils: Laird Breyer's toolkit equivalent to GNU coreutils. Discussed in an interesting essay on what the ideal toolkit should include.
xmldiff: Simple tool for comparing two xml files.
xmltk: doesn't seem to have package in debian, ubuntu, fedora, or macports, hasn't had a release since 2007, and uses non-portable build automation.
xml-coreutils seems the best documented and most UNIX-oriented.
Solution 3:
There is also xml2
and 2xml
pair. It will allow usual string editing tools to process XML.
Example. q.xml:
<?xml version="1.0"?>
<foo>
text
more text
<textnode>ddd</textnode><textnode a="bv">dsss</textnode>
<![CDATA[ asfdasdsa <foo> sdfsdfdsf <bar> ]]>
</foo>
xml2 < q.xml
/foo=
/foo= text
/foo= more text
/foo=
/foo/textnode=ddd
/foo/textnode
/foo/textnode/@a=bv
/foo/textnode=dsss
/foo=
/foo= asfdasdsa <foo> sdfsdfdsf <bar>
/foo=
xml2 < q.xml | grep textnode | sed 's!/foo!/bar/baz!' | 2xml
<bar><baz><textnode>ddd</textnode><textnode a="bv">dsss</textnode></baz></bar>
P.S. There are also html2
/ 2html
.