Extract a string from a line between positions given by a pattern in another line

Using awk:

$ awk '!seen{match($0, /A.*B/);seen=1;next} {print substr($0,RSTART,RLENGTH);seen=0}' infile
7890MNOP
34567890MNOPQRST

Explanation: read in man awk:

RSTART
          The index of the first character matched by match(); 0 if no
          match.  (This implies that character indices start at one.)

RLENGTH
          The length of the string matched by match(); -1 if no match.

match(s, r [, a])  
          Return the position in s where the regular expression r occurs, 
          or 0 if r is not present, and set the values of RSTART and RLENGTH. (...)

substr(s, i [, n])
          Return the at most n-character substring of s starting at I.
          If n is omitted, use the rest of s.

Since you mentioned sed, you can do this with a sed script too:

/^x*Ax*Bx*$/{              # If an index line is matched, then
  N                        # append the next (content) line into the pattern buffer
  :a                       # label a
  s/^x(.*\n).(.*)/\1\2/    # remove "x" from the index line start and a char from the content line start
  ta                       # if a subtitution happened in the previous line then jump back to a
  :b                       # label a
  s/(.*)x(\n.*).$/\1\2/    # remove "x" from the index line end and a char from the content line end
  tb                       # if a subtitution happened in the previous line then jump back to b
  s/.*\n//                 # remove the index line
}

If you put this all on one command line, it looks like this:

$ sed -r '/^x*Ax*Bx*$/{N;:a;s/^x(.*\n).(.*)/\1\2/;ta;:b;s/(.*)x(\n.*).$/\1\2/;tb;s/.*\n//;}' example-file.txt
7890MNOP
34567890MNOPQRST
$

-r is needed so that sed can understand the regex grouping parentheses without extra escapes.

FWIW, I don't think this could be done purely with grep, though I'd be happy to be proven wrong.

Although you can do this with AWK, I suggest Perl. Here's a script:

#!/usr/bin/env perl

use strict;
use warnings;

while (my $pattern = <>) {
    my $text = <>;
    my $start = index $pattern, 'A';
    my $stop = index $pattern, 'B', $start;
    print substr($text, $start, $stop - $start + 1), "\n";
}

You can name that script file whatever you like. If you were to name it interval and put in the current directory, then you can mark it executable with chmod +x interval. Then you can run:

./interval paths...

Replace paths... with the actual pathname or pathnames to the files you want to parse. For example:

$ ./interval interval-example.txt
7890MNOP
34567890MNOPQRST

The way that script works is that, until end of input is reached (i.e., no more lines), it:

Reads a line, $pattern, which is your string with A and B, and another line, $text, which is the string that will be sliced.
Finds the index of the first A in $pattern and the first B aside from any that may have preceded that first A, and stores them in the $start and $stop variables, respectively.
Slices out just the part of $text whose indices range from $start to $stop. Perl's substr function takes offset and length arguments, which is the reason for the subtraction, and you're including the letter immediately under B, which is the reason for adding 1.
Prints just that part, followed by a line break.

If for some reason you'd prefer a short one-line command that achieves the same thing but is easily pasted in--but also is harder to understand and maintain--then you could use this:

perl -wple '$i=index $_,"A"; $_=substr <>,$i,index($_,"B",$i)-$i+1' paths...

(As before, you have to replace paths... with the actual pathnames.)

Extract a string from a line between positions given by a pattern in another line

Related

Recent Posts