Logic for implementing GNU's grep -A -B -C flag
I am trying to implement the logic behind grep's -A -B -C flags. To understand how things work, I have cloned the grep C code and looked into it too. But I am not good with C so I am finding it very difficult.
Let's say I need to search a piece of string in array of string. I can easily do that. I have also saved the index number of matches in another array as well.
Now, how do I implement the logic for those flags. I dont need the code, I just want to understand the logic :)
As a start I'd keep a buffer that holds the last A
lines read (or C
, depending on what has been specified). This buffer has size 1 by default.
As soon as grep finds the pattern it is searching for, it can output the buffer and the consecutive B
(or C
) lines.
The thing to think about is: how to react if grep finds another occurrence of its search pattern within the trailing B
lines. I'd probably opt for printing another B
lines.
Let's show this by example:
index of match = 3
length of lines = 5
A = 2
print lines with indices max(0, 3-2) to 3, inclusive
B = 2
print lines with indices 3 to min(5-1, 3+2), inclusive
C = 3
print lines with indices max(0, 3-3) to min(5-1, 3+3), inclusive
In other words, make sure you don't go out of bounds of your lines range and just look forward, backward, or both depending on the flag.
If there are multiple flags, do as follows:
If A and B: set the min and max bounds accorddingly
If A and C: set the min as min of A and C
and max of the result of the previous calculation with 0
to not go out of bounds.
...
The above algorithm will potentially produce duplicate lines if the lower bounds of one match overlaps the upper bound of the previous one. One way to account for that is as follows:
For each match, if it is not the first in your matches array,
set its lower bound as the max of its lower bound
and the upper bound of the previous match.
If the result is greater than the match itself, set it to the match.
This will produce one duplicate line (the match itself)
but you probably want to keep that for presentation purposes.