Need help using the grep command in terminal
Im trying extract the value of title
in this XML snippet. It is a real hassel to do it with python, so i tought maybe a simple grep and some regex would be sufficent. But im having a hard time with the expression.
I have manage to create this expression on regexr.com title=(["'])(?:(?=(\\?))\2.)*?\1
, but cannot manage to run in the terminal.
In this example snippet, im trying to extract This is what i want to be extracted
.
<Video ratingKey="4" key="/library/metadata/4" guid="com.agents.imdb://tt0322259?lang=en" studio="Mikona Productions GmbH & Co. KG" type="movie" title="This is what i want to be extracted" contentRating="PG-13" summary="It's a major double-cross when former police officer Brian O'Conner teams up with his ex-con buddy Roman Pearce to transport a shipment of "dirty" money for shady Miami-based import-export dealer Carter Verone. But the guys are actually working with undercover agent Monica Fuentes to bring Verone down." rating="3.6" audienceRating="5.0" year="2003" tagline="How Fast Do You Want It?">
<Part id="4" key="/library/parts/4/1534795606/file.mkv" duration="6455870" file="/media/movies/file.mkv" size="14015931289" audioProfile="ma" container="mkv" videoProfile="high" />
</Media>
<Genre tag="Action" />
<Genre tag="Crime" />
</Video>
I have tried multipel iterations of
grep file.xml -e "title=([\"'])(?:(?=(\\?))\2.)*?\1"
grep
can't do replacements. Using your text snippet
sed -ne 's/.*title="\([^"]*\)".*/\1/p' FILE.XML
should work.
Using regular expressions can be a bit tricky for handling XML - a simpler option may be to use xmllint
.
Using your example:
xmllint --xpath "string(/Video/@title)" file.xml
Output:
This is what i want to be extracted
The XPath selects the title
attribute of Video
and converts it to a string.
Note that your XML snippet is not well formed - you have a closing </Media>
but no opening <Media>
tag.