How to grep words from a single string matching a pattern?

Will anyone guide me to grep only words containing pattern _ARA from the below single string.

String:

LINK:['IM219MIR_ARA1','IM18Q4_ARA1','SM18Q4_ARA1','IM18PLANNING_ARA1','IM118Q4DYNVA_ARA1','IM218Q4DYNVA_ARA1','IM119EIOPALTG_ARA1','IM219EIOPALTG_ARA1','SM119EIOPALTG_ARA1']}

Expected output:

IM219MIR_ARA1
IM18Q4_ARA1
SM18Q4_ARA1
IM18PLANNING_ARA1
IM118Q4DYNVA_ARA1
IM218Q4DYNVA_ARA1
IM119EIOPALTG_ARA1
IM119EIOPALTG_ARA1
IM219EIOPALTG_ARA1
SM119EIOPALTG_ARA1

grep accepts -o to print only matching text, on separate lines even if the matches came from the same line. It also accepts -w to force the regular expression to match an entire word (or not match at all), where a word is a maximal sequence of letters, numerals, and underscores. So you can simply use:

grep -ow '\w*_ARA\w*'

In this case you can actually omit the -w option if you like, and get the same result, since the regular expression here is explicitly matching only word characters with \w.

That will read from standard input because there are no filename arguments. If the text you showed is in a file--say, called input.txt--then you would pass that as an argument:

grep -ow '\w*_ARA\w*' input.txt

This outputs:

IM219MIR_ARA1
IM18Q4_ARA1
SM18Q4_ARA1
IM18PLANNING_ARA1
IM118Q4DYNVA_ARA1
IM218Q4DYNVA_ARA1
IM119EIOPALTG_ARA1
IM219EIOPALTG_ARA1
SM119EIOPALTG_ARA1

Technically, the output this produces is slightly different from what you showed in your question, because the expected output you showed lists IM119EIOPALTG_ARA1 twice, even though it appears only once in the text you showed. I presume this is a mistake and you actually want it just once.


If you want to use cut and sed commands, use this :

<test.txt cut -d'[' -f2 | cut -d']' -f1 | sed "s/,'/\\n/g" | sed 's/.$//' | cut -d\' -f2 | grep _ARA

Explanation in 2 parts:

  • grep _ARA would find lines that must be filtered
  • cut -d'[' -f2 will remove characters before your words, same for cut -d']' -f1 which would remove what is after
  • sed "s/,'/\\n/g" will extract each word in one line
  • <test.txt is just a redirection for cut and grep command

After this 4 previous commands, result is :

'IM219MIR_ARA1'
IM18Q4_ARA1'
SM18Q4_ARA1'
IM18PLANNING_ARA1'
IM118Q4DYNVA_ARA1'
IM218Q4DYNVA_ARA1'
IM119EIOPALTG_ARA1'
IM219EIOPALTG_ARA1'
SM119EIOPALTG_ARA1'

So, to remove the ' at the end of each word, we add

sed 's/.$//'

and for the 1rst ', we use

cut -d\' -f2

So the final result is :

IM219MIR_ARA1
IM18Q4_ARA1
SM18Q4_ARA1
IM18PLANNING_ARA1
IM118Q4DYNVA_ARA1
IM218Q4DYNVA_ARA1
IM119EIOPALTG_ARA1
IM219EIOPALTG_ARA1
SM119EIOPALTG_ARA1

If you want more details about this command, you can read my discussion with Eliah Kagan.