How do I replace multiple lines with single word in file(inplace replace)?

Solution 1:

This can be done very easily in perl:

$ perl -i -p0e 's/START.*?END/SINGLEWORD/s' file
$ cat file
My block of line starts from here 
SINGLEWORD
and end to here for example. 

Explanation

-0 sets the line separator to null

-p apply the script given by -e to each line and print that line

The regexp modifier:

  • /s Treat string as single line. That is, change . to match any character whatsoever, even a newline, which normally it would not match.

Why the ?:

  • By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a ?.

Solution 2:

I was wondering if this is possible without perl, python and others. And I found this solution using sed:

$ sed ':a;N;$!ba;s/START.*END/SINGLEWORD/g' filename

Explanation:

  1. :a create a label 'a'
  2. N append the next line to the pattern space
  3. $! if not the last line, ba branch (go to) label 'a'
  4. s substitute, /START.*END/ by SINGLEWORD,/g global match (as many times as it can)

It was found here.

Solution 3:

While ripgrep specifically doesn't support inline replacement, I've found that its current --replace functionality is already useful for this use case:

rg --replace 'SINGLEWORD' --passthru --no-line-number \
--multiline --multiline-dotall 'START.*?END' input.txt > output.txt

Explanation:

  • --replace 'SINGLEWORD' enables replacement mode and sets the replacement string. Can include captured regex groups by using $1 etc.
  • --passthru is needed since ripgrep usually only shows the lines matching the regex pattern. With this option it also shows all lines from the file that don't match.
  • --no-line-number / -N is because by default ripgrep includes the line numbers in the output (useful when only the matching lines are shown).
  • --multiline / -U enabled multiline processing since it's disabled by default.
  • --multiline-dotall is only needed if you want the dot ('.') regex pattern to match newlines (\n).
  • > output.txt is needed since inline replace isn't supported. With the --passthrough and no-line-number options the standard output matches the desired new file with replacements and can be saved as usual.

However, this command isn't as useful for processing multiple files, as it needs to be run separately per file.