Is a /start/,/end/ range expression ever useful in awk?

I've always contended that you should never use a range expression like:

/start/,/end/

in awk because although it makes the trivial case where you only want to print matching text including the start and end lines slightly briefer than the alternative*:

/start/{f=1} f{print; if (/end/) f=0}

when you want to tweak it even slightly to do anything else, it requires a complete re-write or results in duplicated or otherwise undesirable code. e.g. if you want to print the matching text excluding the range delimiters using the second form above you'd just tweak it to move the components around:

f{if (/end/) f=0; else print} /start/{f=1}

but if you started with /start/,/end/ you'd need to abandon that approach in favor of what I just posted or you'd have to write something like:

/start/,/end/{ if (!/start|end/) print }

i.e. duplicate the conditions which is undesirable.

Then I saw a question posted that required identifying the LAST end in a file and where a range expression was used in the solution and I thought it seemed like that might have some value (see https://stackoverflow.com/a/21145009/1745001).

Now, though, I'm back to thinking that it's just not worth bothering with range expressions at all and a solution that doesn't use range expressions would have worked just as well for that case.

So - does anyone have an example where a range expression actually adds noticeable value to a solution?

*I used to use:

/start/{f=1} f; /end/{f=0}

but too many times I found I had to do something additional when f is true and /end/ is found (or to put it another way ONLY do something when /end/ is found IF f were true) so now I just try to stick to the slightly less brief but much more robust and extensible:

/start/{f=1} f{print; if (/end/) f=0}

Solution 1:

Interesting. I also often start with a range expression and then later on switch to using a variable..

I think a situation where this could be useful, aside from the pure range-only situations is if you want to print a match, but only if it lies in a certain range. Also because it is immediately obvious what it does. For example:

awk '/start/,/end/{if(/ppp/)print}' file

with this input:

start
dfgd gd
ppp 1
gfdg
fd gfd
end
ppp 2 
ppp 3
start
ppp 4
ppp 5
end
ppp 6
ppp 7
gfdgdgd

will produce:

ppp 1
ppp 4
ppp 5

-- One could of course also use:

awk '/start/{f=1} /ppp/ && f; /end/{f=0}' file

But it is longer and somewhat less readable..

Solution 2:

While you are right that the /start/,/end/ range expression can easily be reimplemented with a conditional, it has many interesting use-cases where it is used on its own. As you observe it, it might have little value for processing of tabular data, the main but not only use case of awk.

So - does anyone have an example where a range expression actually adds noticeable value to a solution?

In the mentioned use-cases, the range expression improves legibility. Here are a few examples, where the range expression accurately selects the text to be processed. These are only a hand of examples, but there is countlessly similar applications, demonstrating the incredible versatility of awk.

Filter logs within a time range

Assuming each log line starts with an ISO timestamp, the filter below selects all events in a given range of 1 hour:

awk '/^2015-06-30T12:00:00Z/,/^2015-06-30T13:00:00Z/'

Extract a document from a file

awk '/---- begin file.data ----/,/---- end file.data ----/'

This can be used to bundle resources with shell scripts (with cat), to extract parts of GPG-signed messages (prepared with --clearsign) or more generally of MIME-messages.

Process LaTeX files

The range pattern can be used to match LaTeX environments, so for instance we can select the abstracts of all articles in our directory:

awk '/begin{abstract}/,/end{abstract}/' *.tex

or all the theorems, to prepare a theorem database!

awk '/begin{theorem}/,/end{theorem}/' *.tex

or write a linter ensuring that theorems do not contain citations (if we regard this as bad style):

awk '
  /begin{theorem}/,/end{theorem}/ { if(/\\cite{/) { c+= 1 } }
  END { printf("There were %d bad-style citations.\n", c) }
'

or preprocess tables, etc.