How can I edit a range of text between 2 symbols? awk, sed, regex
Using the "*" symbol, (doesn't have to be that one, any special character will do in order to indicate), how can I edit the text from this:
*berry
straw
rasp
blue
boysen
*
blahblah
blahblah
blahblah
*berry
straw
blue
*
blah
*table
vege
pingpong
*
To this:
strawberry
raspberry
blueberry
boysenberry
blahblah
blahblah
blahblah
strawberry
blueberry
blah
vegetable
pingpongtable
Every character after the first matching asterisk will be placed on every line until the 2nd asterisk match is found.
Any leads on how I can go about this? (sed or awk would be preferred, but if you can think of another way, please shoot me your code!)
I know how to remove all lines containing an asterisk, it's just the character placement part I can't think of
This awk
code could be enough:
awk -F'*' 'NF == 2 {label = $2; next} {$0 = $0 label} 1'
To break it down:
- Use
*
as the field separator. This way, we can simply examine the number of fields (NF
) to determine if the beginning or end of a block is reached. - When there are two fields, we save the second field in
label
and continue to the next line. - From then, we append that
label
to the current line, and then print. If the label is empty, we are outside a block and there's no effect. If not, we get the required output.
In sed
, you could copy the "special" line into hold space before deleting it
sed -e '/^\*/{h;d;}'
and then append the hold space to each succeeding pattern space, replacing the resulting newline and marker character
-e '{G;s/\n\*//;}'
Testing it with your data,
$ sed -e '/^\*/{h;d;}' -e '{G;s/\n\*//;}' file
strawberry
raspberry
blueberry
boysenberry
blahblah
blahblah
blahblah
strawberry
blueberry
blah
vegetable
pingpongtable
Note: this doesn't stop when it encounters the second asterisk; it does exactly the same, but it's appending *
followed by nothing - until it matches the next *sometext
.
Here's a Perl way:
$ perl -lne '/^\*(.*)/ || print "$_$1"' file
strawberry
raspberry
blueberry
boysenberry
blahblah
blahblah
blahblah
strawberry
blueberry
blah
vegetable
pingpongtable
Explanation
The -n
will cause Perl to read each line of the input file, saving it in the special variable $_
, the -l
will cause it to i) strip trailing newlines (\n
) from each line and ii) add a newline to each call of print
. The -e
is the script that is applied to each line.
/^\*(.*)/
: match lines that start with an asterisk and save everything after the asterisk as$1
(that's what the parentheses do).|| print "$_$1"'
: the||
is a logicalOR
. Therefore, theprint
will only be executed if the current line did not start with an asterisk. If so, we print the current line ($_
) along with whatever is currently saved as$1
(the pattern following the asterisk).
As usual, there are many ways of doing this. A silly and inefficient one, but one which highlights the string manipulation capabilities of the shell, is:
$ while read line; do
[[ $line =~ ^\* ]] && pat="${line#\*}" || printf "%s%s\n" "$line" "$pat";
done < file
strawberry
raspberry
blueberry
boysenberry
blahblah
blahblah
blahblah
strawberry
blueberry
blah
vegetable
pingpongtable
Explanation
-
while read line; do ... ; done < file
: this is a classicwhile
loop which will read each line of the input filefile
and save it as$line
. -
[[ $line =~ ^\* ]] && pat="${line#\*}"
: if the line starts with an*
, remove everything after that (that's what the${line#\*}
does, for more details, see here) and save it as$pat
. *|| printf "%s%s\n" "$line" "$pat";
: if the previous command failed (so, the line does not start with an asterisk), print the line and the current value of$pat
.
Through my favorite Python...
with open('/path/to/the/file') as f:
counter = False
for line in f:
if line.startswith('*') and not counter:
m = line.strip().lstrip('*')
counter = True
elif line.startswith('*') and counter:
counter = False
elif counter:
if not line.startswith('*'):
print(line.strip() + m)
else:
print(line.strip())
Came here late. Here is another python
approach:
#!/usr/bin/env python2
with open('/path/to/file.txt') as f:
for lines in f.read().split('*'):
entries = lines.rstrip().split('\n')
for i in range(1, len(entries)):
print entries[i] + entries[0]