How to grep words from a single string matching a pattern?
Will anyone guide me to grep only words containing pattern _ARA from the below single string.
String:
LINK:['IM219MIR_ARA1','IM18Q4_ARA1','SM18Q4_ARA1','IM18PLANNING_ARA1','IM118Q4DYNVA_ARA1','IM218Q4DYNVA_ARA1','IM119EIOPALTG_ARA1','IM219EIOPALTG_ARA1','SM119EIOPALTG_ARA1']}
Expected output:
IM219MIR_ARA1
IM18Q4_ARA1
SM18Q4_ARA1
IM18PLANNING_ARA1
IM118Q4DYNVA_ARA1
IM218Q4DYNVA_ARA1
IM119EIOPALTG_ARA1
IM119EIOPALTG_ARA1
IM219EIOPALTG_ARA1
SM119EIOPALTG_ARA1
grep
accepts -o
to print only matching text, on separate lines even if the matches came from the same line. It also accepts -w
to force the regular expression to match an entire word (or not match at all), where a word is a maximal sequence of letters, numerals, and underscores. So you can simply use:
grep -ow '\w*_ARA\w*'
In this case you can actually omit the -w
option if you like, and get the same result, since the regular expression here is explicitly matching only word characters with \w
.
That will read from standard input because there are no filename arguments. If the text you showed is in a file--say, called input.txt
--then you would pass that as an argument:
grep -ow '\w*_ARA\w*' input.txt
This outputs:
IM219MIR_ARA1
IM18Q4_ARA1
SM18Q4_ARA1
IM18PLANNING_ARA1
IM118Q4DYNVA_ARA1
IM218Q4DYNVA_ARA1
IM119EIOPALTG_ARA1
IM219EIOPALTG_ARA1
SM119EIOPALTG_ARA1
Technically, the output this produces is slightly different from what you showed in your question, because the expected output you showed lists IM119EIOPALTG_ARA1
twice, even though it appears only once in the text you showed. I presume this is a mistake and you actually want it just once.
If you want to use cut
and sed
commands, use this :
<test.txt cut -d'[' -f2 | cut -d']' -f1 | sed "s/,'/\\n/g" | sed 's/.$//' | cut -d\' -f2 | grep _ARA
Explanation in 2 parts:
-
grep _ARA
would find lines that must be filtered -
cut -d'[' -f2
will remove characters before your words, same forcut -d']' -f1
which would remove what is after -
sed "s/,'/\\n/g"
will extract each word in one line -
<test.txt
is just a redirection forcut
andgrep
command
After this 4 previous commands, result is :
'IM219MIR_ARA1'
IM18Q4_ARA1'
SM18Q4_ARA1'
IM18PLANNING_ARA1'
IM118Q4DYNVA_ARA1'
IM218Q4DYNVA_ARA1'
IM119EIOPALTG_ARA1'
IM219EIOPALTG_ARA1'
SM119EIOPALTG_ARA1'
So, to remove the '
at the end of each word, we add
sed 's/.$//'
and for the 1rst '
, we use
cut -d\' -f2
So the final result is :
IM219MIR_ARA1
IM18Q4_ARA1
SM18Q4_ARA1
IM18PLANNING_ARA1
IM118Q4DYNVA_ARA1
IM218Q4DYNVA_ARA1
IM119EIOPALTG_ARA1
IM219EIOPALTG_ARA1
SM119EIOPALTG_ARA1
If you want more details about this command, you can read my discussion with Eliah Kagan.