How to find all patterns between two characters?
I'm trying to find all patterns between a pair of double quotes. Let say I have a file with contents look like as following:
first matched is "One". the second is here"Two "
and here are in second line" Three ""Four".
I want to below words as output:
One
Two
Three
Four
As you can see all strings in output are between a pair of quotes.
What I tried, is this command:
grep -Po ' "\K[^"]*' file
Above command works fine if I have a space before first pair of "
marks. For example it works if my input file contains the following:
first matched is "One". the second is here "Two "
and here are in second line " Three " "Four".
I know I can do this with multiple commands combination. But I'm looking for one command and without using that for multiple time. e.g: below command
grep -oP '"[^"]*"' file | grep -oP '[^"]*'
How can I achieve/print all of my patterns just using one command?
Reply to comments: It's not important for me to removing whitespace around matched pattern inside a pair of quotes, but it would be better if the command support it too. and also my files contain nested quotes like "foo "bar" zoo"
. And all of the quoted words are in separate lines and they are not expanded to multi lines.
Thanks in advance.
First of all, your grep -Po '"\K[^"]*' file
idea fails because grep
sees both "One"
and ". the second is here"
as being inside quotes. Personally, I'd probably just do
$ grep -oP '"[^"]+"' file | tr -d '"'
One
Two
Three
Four
But that is two commands. To do it with a single command, you could use one of:
-
Perl
$ perl -lne '@F=/"\s*([^"]+)\s*"/g; print for @F' file One Two Three Four
Here, the
@F
array holds all matches of the regex (a quote, followed by as many non-"
as possible until the next"
). Theprint for @F
just means "print each element of@F
. -
Perl
$ perl -F'"' -lne 'for($i=1;$i<=$#F;$i+=2){print $F[$i]}' file One Two Three Four
To remove leading/trailing spaces from each match, use this:
perl -F'"' -lne 'for($i=1;$i<=$#F;$i+=2){$F[$i]=~s/^\s*|\s$//; print $F[$i]}' file
Here, Perl is behaving like
awk
. The-a
switch causes it to automatically split input lines into fields on the character given by-F
. Since I have given it"
, the fields are:$ perl -F'"' -lne 'for($i=0;$i<=$#F;$i++){print "Field $i: $F[$i]"}' file Field 0: first matched is Field 1: One Field 2: . the second is here Field 3: Two Field 0: and here are in second line Field 1: Three Field 2: Field 3: Four Field 4: .
Because we are looking for text between two consecutive field separators, we know we want every second field. So,
for($i=1;$i<=$#F;$i+=2){print $F[$i]}
will print the ones we care about. -
The same idea but in
awk
:$ awk -F'"' '{for(i=2;i<=NF;i+=2){print $(i)}}' file One Two Three Four
The key is to consume the quotes in your expression. Hard to do that with a single grep command. Here's a perl one-liner:
perl -0777 -nE 'say for /"(.*?)"/sg' file
That slurps the whole input and prints out the captured part of the match. It will work even if there's a newline inside the quotes, although it then becomes difficult to separate elements with and without newlines. To help with that, use a different character as the output record separator, the null character for instance
perl -0777 -lne 'print for /"(.*?)"/sg} BEGIN {$\="\0"' <<DATA | od -c
blah "first" blah "second
quote with newline" blah "third"
DATA
0000000 f i r s t \0 s e c o n d \n q u o
0000020 t e w i t h n e w l i n e \0
0000040 t h i r d \0
0000046
This could be possible with the below grep one liner and i assumed that you have balanced quotation marks.
grep -oP '"\s*\K[^"]+?(?=\s*"(?:[^"]*"[^"]*")*[^"]*$)' file
Example:
$ cat file
first matched is "One". the second is here"Two "
and here are in second line" Three ""Four".
$ grep -oP '"\s*\K[^"]+?(?=\s*"(?:[^"]*"[^"]*")*[^"]*$)' file
One
Two
Three
Four
Another hair pulling solution through PCRE verb (*SKIP)(*F)
,
$ grep -oP '[^"]+(?=(?:"[^"]*"[^"]*)*[^"]*$)(*SKIP)(*F)|\s*\K[^"]+(?=\b\s*)' file
One
Two
Three
Four