How to grep a section of a file in bash shell
How can I "grep" the lines between an occurrence of some string1 and the (Nth) occurrence of some string2.
e.g.
if the file has line:
A
B
C
D
E
F
G
B
C
E
Q
I want to get the lines in bold (those that begin with a B and end with an E).
Can this be done using grep? or some other Unix command line tool?
grep
is not well suited for this task, you need to go one tool "up":
sed -n '/^B/,/^E/p' infile
Output:
B
C
D
E
B
C
E
With regards to the Nth requirement, I think its easiest if you again advance one tool "up", namely awk:
awk '/^B/ { f = 1; n++ } f && n == wanted; /^E/ { f = 0 }' wanted=2 infile
Output:
B
C
E
The flag f
will be set when /^B/
is encountered and unset when /^E/
occurs, much in the same way the sed notation works. n
keeps a tally of how many blocks have passed and when f == 1 && n == wanted
is true, the default block will be executed ({ print $0 }
).
@Thor's sed
command cannot be beaten, but with the following perl
script I try to address the part of your question in parenthesis: "... the (Nth) occurrence ...".
Usage:
./script <start-regex> <end-regex> [N]
Examples with the file in your question:
$ ./script "B" "E" < examplefile
B
C
D
E
B
C
E
$ ./script "B" "E" 2 < examplefile
B
C
D
E
F
G
B
C
E
There is no error checking or whatsoever and the script is non-greedy, i.e. from A B C D E E F
only B C D E
will be grep'ed with N=1.
#!/usr/bin/perl
if ($ARGV[2] != "") { $n = $ARGV[2] } else { $n = 1 }
$begin_str = $ARGV[0];
$end_str = $ARGV[1];
while(<STDIN>) {
if($_ =~ $begin_str) { $flag=1 } # beginning of match, set flag
if($_ =~ $end_str && $flag eq 1) { $i++ } # i-th occurence of end string
if($i eq $n) { # end of match after n occurences of end string
$flag=2;
$i=0;
}
if ($flag ge 1) { # append currrent line to matching part
$out.=$_;
}
if($flag eq 2) { # after detection of end of match, print complete match
print $out;
# print "---\n"; # separator after a match
$out="";
$flag=0;
}
}