Need help using the grep command in terminal

terminal

Im trying extract the value of title in this XML snippet. It is a real hassel to do it with python, so i tought maybe a simple grep and some regex would be sufficent. But im having a hard time with the expression.

I have manage to create this expression on regexr.com title=(["'])(?:(?=(\\?))\2.)*?\1, but cannot manage to run in the terminal.

In this example snippet, im trying to extract This is what i want to be extracted.

<Video ratingKey="4" key="/library/metadata/4" guid="com.agents.imdb://tt0322259?lang=en" studio="Mikona Productions GmbH &amp; Co. KG" type="movie" title="This is what i want to be extracted" contentRating="PG-13" summary="It&#39;s a major double-cross when former police officer Brian O&#39;Conner teams up with his ex-con buddy Roman Pearce to transport a shipment of &#34;dirty&#34; money for shady Miami-based import-export dealer Carter Verone. But the guys are actually working with undercover agent Monica Fuentes to bring Verone down." rating="3.6" audienceRating="5.0" year="2003" tagline="How Fast Do You Want It?">
<Part id="4" key="/library/parts/4/1534795606/file.mkv" duration="6455870" file="/media/movies/file.mkv" size="14015931289" audioProfile="ma" container="mkv" videoProfile="high" />
</Media>
<Genre tag="Action" />
<Genre tag="Crime" />
</Video>

I have tried multipel iterations of

grep file.xml -e "title=([\"'])(?:(?=(\\?))\2.)*?\1"

grep can't do replacements. Using your text snippet

sed -ne 's/.*title="\([^"]*\)".*/\1/p' FILE.XML

should work.

Using regular expressions can be a bit tricky for handling XML - a simpler option may be to use xmllint.

Using your example:

xmllint --xpath "string(/Video/@title)" file.xml

Output:

This is what i want to be extracted

The XPath selects the title attribute of Video and converts it to a string.

Note that your XML snippet is not well formed - you have a closing </Media> but no opening <Media> tag.

Need help using the grep command in terminal

Related

Recent Posts