Extract a string from a line between positions given by a pattern in another line
Using awk
:
$ awk '!seen{match($0, /A.*B/);seen=1;next} {print substr($0,RSTART,RLENGTH);seen=0}' infile
7890MNOP
34567890MNOPQRST
Explanation: read in man awk
:
RSTART
The index of the first character matched by match(); 0 if no
match. (This implies that character indices start at one.)
RLENGTH
The length of the string matched by match(); -1 if no match.
match(s, r [, a])
Return the position in s where the regular expression r occurs,
or 0 if r is not present, and set the values of RSTART and RLENGTH. (...)
substr(s, i [, n])
Return the at most n-character substring of s starting at I.
If n is omitted, use the rest of s.
Since you mentioned sed, you can do this with a sed script too:
/^x*Ax*Bx*$/{ # If an index line is matched, then
N # append the next (content) line into the pattern buffer
:a # label a
s/^x(.*\n).(.*)/\1\2/ # remove "x" from the index line start and a char from the content line start
ta # if a subtitution happened in the previous line then jump back to a
:b # label a
s/(.*)x(\n.*).$/\1\2/ # remove "x" from the index line end and a char from the content line end
tb # if a subtitution happened in the previous line then jump back to b
s/.*\n// # remove the index line
}
If you put this all on one command line, it looks like this:
$ sed -r '/^x*Ax*Bx*$/{N;:a;s/^x(.*\n).(.*)/\1\2/;ta;:b;s/(.*)x(\n.*).$/\1\2/;tb;s/.*\n//;}' example-file.txt
7890MNOP
34567890MNOPQRST
$
-r
is needed so that sed
can understand the regex grouping parentheses without extra escapes.
FWIW, I don't think this could be done purely with grep
, though I'd be happy to be proven wrong.
Although you can do this with AWK, I suggest Perl. Here's a script:
#!/usr/bin/env perl
use strict;
use warnings;
while (my $pattern = <>) {
my $text = <>;
my $start = index $pattern, 'A';
my $stop = index $pattern, 'B', $start;
print substr($text, $start, $stop - $start + 1), "\n";
}
You can name that script file whatever you like. If you were to name it interval
and put in the current directory, then you can mark it executable with chmod +x interval
. Then you can run:
./interval paths...
Replace paths...
with the actual pathname or pathnames to the files you want to parse. For example:
$ ./interval interval-example.txt
7890MNOP
34567890MNOPQRST
The way that script works is that, until end of input is reached (i.e., no more lines), it:
- Reads a line,
$pattern
, which is your string withA
andB
, and another line,$text
, which is the string that will be sliced. - Finds the index of the first
A
in$pattern
and the firstB
aside from any that may have preceded that firstA
, and stores them in the$start
and$stop
variables, respectively. - Slices out just the part of
$text
whose indices range from$start
to$stop
. Perl'ssubstr
function takes offset and length arguments, which is the reason for the subtraction, and you're including the letter immediately underB
, which is the reason for adding1
. - Prints just that part, followed by a line break.
If for some reason you'd prefer a short one-line command that achieves the same thing but is easily pasted in--but also is harder to understand and maintain--then you could use this:
perl -wple '$i=index $_,"A"; $_=substr <>,$i,index($_,"B",$i)-$i+1' paths...
(As before, you have to replace paths...
with the actual pathnames.)