Extracting a specific string after a given string from HTML file using a bash script
I can't sensibly advise doing this, because parsing html with regex is not likely to end well but you might be able to get the string MANIKA
with
sed -nr '/MOM:/ s/.*MOM:([^"]+).*/\1/p' file
It works OK on your sample anyway...
Notes
-
-n
don't print anything until we ask for it -
-r
use ERE -
/string/
find lines withstring
-
s/old/new/
replaceold
withnew
-
.*
any number of any characters -
([^"]+)
save some characters that are not"
-
\1
backreference to saved characters -
p
print just the lines we changed
grep -Po 'MOM:\K[^"]+' file.html
Warning: this is not a very robust solution; And your HTML is not valid