How to use sed/grep to extract text between two words?
I am trying to output a string that contains everything between two words of a string:
input:
"Here is a String"
output:
"is a"
Using:
sed -n '/Here/,/String/p'
includes the endpoints, but I don't want to include them.
GNU grep can also support positive & negative look-ahead & look-back: For your case, the command would be:
echo "Here is a string" | grep -o -P '(?<=Here).*(?=string)'
If there are multiple occurrences of Here
and string
, you can choose whether you want to match from the first Here
and last string
or match them individually. In terms of regex, it is called as greedy match (first case) or non-greedy match (second case)
$ echo 'Here is a string, and Here is another string.' | grep -oP '(?<=Here).*(?=string)' # Greedy match
is a string, and Here is another
$ echo 'Here is a string, and Here is another string.' | grep -oP '(?<=Here).*?(?=string)' # Non-greedy match (Notice the '?' after '*' in .*)
is a
is another
sed -e 's/Here\(.*\)String/\1/'
The accepted answer does not remove text that could be before Here
or after String
. This will:
sed -e 's/.*Here\(.*\)String.*/\1/'
The main difference is the addition of .*
immediately before Here
and after String
.
You can strip strings in Bash alone:
$ foo="Here is a String"
$ foo=${foo##*Here }
$ echo "$foo"
is a String
$ foo=${foo%% String*}
$ echo "$foo"
is a
$
And if you have a GNU grep that includes PCRE, you can use a zero-width assertion:
$ echo "Here is a String" | grep -Po '(?<=(Here )).*(?= String)'
is a
Through GNU awk,
$ echo "Here is a string" | awk -v FS="(Here|string)" '{print $2}'
is a
grep with -P
(perl-regexp) parameter supports \K
, which helps in discarding the previously matched characters. In our case , the previously matched string was Here
so it got discarded from the final output.
$ echo "Here is a string" | grep -oP 'Here\K.*(?=string)'
is a
$ echo "Here is a string" | grep -oP 'Here\K(?:(?!string).)*'
is a
If you want the output to be is a
then you could try the below,
$ echo "Here is a string" | grep -oP 'Here\s*\K.*(?=\s+string)'
is a
$ echo "Here is a string" | grep -oP 'Here\s*\K(?:(?!\s+string).)*'
is a