Extract a string from a line between positions given by a pattern in another line

Using awk:

$ awk '!seen{match($0, /A.*B/);seen=1;next} {print substr($0,RSTART,RLENGTH);seen=0}' infile
7890MNOP
34567890MNOPQRST

Explanation: read in man awk:

RSTART
          The index of the first character matched by match(); 0 if no
          match.  (This implies that character indices start at one.)

RLENGTH
          The length of the string matched by match(); -1 if no match.

match(s, r [, a])  
          Return the position in s where the regular expression r occurs, 
          or 0 if r is not present, and set the values of RSTART and RLENGTH. (...)

substr(s, i [, n])
          Return the at most n-character substring of s starting at I.
          If n is omitted, use the rest of s.

Since you mentioned sed, you can do this with a sed script too:

/^x*Ax*Bx*$/{              # If an index line is matched, then
  N                        # append the next (content) line into the pattern buffer
  :a                       # label a
  s/^x(.*\n).(.*)/\1\2/    # remove "x" from the index line start and a char from the content line start
  ta                       # if a subtitution happened in the previous line then jump back to a
  :b                       # label a
  s/(.*)x(\n.*).$/\1\2/    # remove "x" from the index line end and a char from the content line end
  tb                       # if a subtitution happened in the previous line then jump back to b
  s/.*\n//                 # remove the index line
}

If you put this all on one command line, it looks like this:

$ sed -r '/^x*Ax*Bx*$/{N;:a;s/^x(.*\n).(.*)/\1\2/;ta;:b;s/(.*)x(\n.*).$/\1\2/;tb;s/.*\n//;}' example-file.txt
7890MNOP
34567890MNOPQRST
$ 

-r is needed so that sed can understand the regex grouping parentheses without extra escapes.


FWIW, I don't think this could be done purely with grep, though I'd be happy to be proven wrong.


Although you can do this with AWK, I suggest Perl. Here's a script:

#!/usr/bin/env perl

use strict;
use warnings;

while (my $pattern = <>) {
    my $text = <>;
    my $start = index $pattern, 'A';
    my $stop = index $pattern, 'B', $start;
    print substr($text, $start, $stop - $start + 1), "\n";
}

You can name that script file whatever you like. If you were to name it interval and put in the current directory, then you can mark it executable with chmod +x interval. Then you can run:

./interval paths...

Replace paths... with the actual pathname or pathnames to the files you want to parse. For example:

$ ./interval interval-example.txt
7890MNOP
34567890MNOPQRST

The way that script works is that, until end of input is reached (i.e., no more lines), it:

  • Reads a line, $pattern, which is your string with A and B, and another line, $text, which is the string that will be sliced.
  • Finds the index of the first A in $pattern and the first B aside from any that may have preceded that first A, and stores them in the $start and $stop variables, respectively.
  • Slices out just the part of $text whose indices range from $start to $stop. Perl's substr function takes offset and length arguments, which is the reason for the subtraction, and you're including the letter immediately under B, which is the reason for adding 1.
  • Prints just that part, followed by a line break.

If for some reason you'd prefer a short one-line command that achieves the same thing but is easily pasted in--but also is harder to understand and maintain--then you could use this:

perl -wple '$i=index $_,"A"; $_=substr <>,$i,index($_,"B",$i)-$i+1' paths...

(As before, you have to replace paths... with the actual pathnames.)