Variables for constructing patterns for use with sed
I have written a bash function to print sections of text enclosed between lines matching ## mode: org
and ## # End of org
in a file, with an empty line between sections. Before the ##
, there can be any number of spaces.
Here is an example of a file to extract information from.
file: test.sh
## mode: org
## * Using case statement
## # End of org
case $arg in
("V")
echo "Author"
;;
(*)
## mode: org
## ** Silent Error Reporting Mode (SERM) in getopts
## *** Detects warnings without printing built-in messages.
## *** Enabled by colon {:} as first character in shortopts.
## # End of org
break
;;
esac
The desired output would be
Code:
* Using case statement
** Silent Error Reporting Mode (SERM) in getopts
*** Detects warnings without printing built-in messages.
*** Enabled by colon {:} as first character in shortopts.
Here is the function I am using
capture-org ()
{
local efile="$1"
local begsec="## mode: org"
local endsec="## # End of org"
sed -n "/^[[:space:]]*${begsec}$/,/^[[:space:]]*${endsec}$/s/ *//p'" "$efile" |
sed 's/^'"${begsec}"'$/\n'"${begsec}"'/' |
sed '/^'"${begsec}"'$/d' | sed '/^'"${endsec}"'$/d' | cut -c 3-
}
I would like to simplify the function, using variables to construct patterns. But need some assistance to compile commands together such that I do not have to call sed
so many times.
Perhaps using awk
would be a better strategy.
capture-org ()
{
local efile="$1"
local begsec='^[[:space:]]*## mode: org$'
local endsec='^[[:space:]]*## # End of org$'
sed -n "/${begsec}/,/${endsec}/s/ *//p" "$efile" |
sed 's/^## # End of org$/## # End of org\n/' |
sed '/^## mode: org$/d' | sed '/^## # End of org$/d' | cut -c 3-
}
Solution 1:
I would indeed use something more sophisticated for this. Like awk:
$ awk -v start="$begsec" -v end="$endsec" \
'{
if($0~start){want=1; next}
if($0~end){want=0; print ""; next}
gsub(/\s*#+\s*/,"");
} want' file
* Using case statement
** Silent Error Reporting Mode (SERM) in getopts
*** Detects warnings without printing built-in messages.
*** Enabled by colon {:} as first character in shortopts.
Or, using your last function there as a template:
capture-rec ()
{
local begsec='## mode: org'
local endsec='## # End of org'
awk -v start="$begsec" -v end="$endsec" \
'{
if($0~start){want=1; next}
if($0~end){want=0; print ""; next}
gsub(/\s*#+\s*/,"");
} want' "$1"
}
One caveat that may be important is that this doesn't not require that the $begsec
and $endsec
be the only things on the line other than leading whitespace like your approach did, it simply searches for them anywhere on the line. I am assuming this isn't a very big deal considering what you are looking for, but if it is, you can use this instead which will remove whitespace at the beginning and end of the line before matching:
capture-rec ()
{
local begsec='## mode: org'
local endsec='## # End of org'
awk -v start="$begsec" -v end="$endsec" \
'{
sub(/^[[:space:]]*/,"");
sub(/[[:space:]]*$/,"");
if($0==start){ want=1; next}
if($0==end){ want=0; print ""; next}
gsub(/\s*#+\s*/,"");
} want' "$1"
}