Is there a command-line utility app which can find a specific block of lines in a text file, and replace it?
UPDATE (see end of question)
The text "search and replace" utility programs I've seen, seem to only search on a line-by-line basis...
Is there a command-line tool which can locate one block of lines (in a text file), and replace it with another block of lines.?
For example: Does the test file file contain this exact group
of lines:
'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.
'Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
The frumious Bandersnatch!'
I want this, so that I can replace multiple lines of text in a file and know I'm not overwriting the wrong lines.
I would never replace "The Jabberwocky" (Lewis Carroll), but it makes a novel example :)
UPDATE:
..(sub-update) My following comment about reasons when not use sed are only in the context of; don't push any tool too far beyond its design intent (I use sed quite often, and consider it to be invaluable.)
I just now found an interesting web page about sed and when not to use it.
So, because of all the sed answers, I"ll post the link.. it is part of the sed FAQ on sourceforge
Also, I'm pretty sure there is some way diff
can do the job of locating the block of text (once it's located, the replacement is quite straight foward; using head
and tail
) ... 'diff' dumps all the necessary data, but I haven't yet worked out how to filter it , ... (I'm still working on it)
This simple python script should do the task:
#!/usr/bin/env python
# Syntax: multiline-replace.py input.txt search.txt replacement.txt
import sys
inp = open(sys.argv[1]).read()
needle = open(sys.argv[2]).read()
replacement = open(sys.argv[3]).read()
sys.stdout.write(inp.replace(needle,replacement))
Like most other solutions, it has the disadvantage that the whole file is slurped into memory at once. For small text files, it should work well enough, however.
Approach 1: temporarily change newlines into something else
The following snippet swaps newlines with pipes, performs the replacement, and swaps separators back. The utility may choke if the line it sees it extremely long. You can choose any character to swap with as long as it's not in your search string.
<old.txt tr '\n' '|' |
sed 's/\(|\|^\)'\''Twas … toves|Did … Bandersnatch!'\''|/new line 1|new line 2|/g' |
tr '|' '\n' >new.txt
Approach 2: change the utility's record separator
Awk and perl support setting two or more blank lines as the record separator. With awk, pass -vRS=
(empty RS
variable). With Perl, pass -000
(“paragraph mode”) or set $,=""
. This is not helpful here though since you have a multi-paragraph search string.
Awk and perl also support setting any string as the record separator. Set RS
or $,
to any string that is not in your search string.
<old.txt perl -pe '
BEGIN {$, = "|"}
s/^'\''Twas … toves\nDid … Bandersnatch!'\''$/new line 1\nnew line 2/mg
' >new.txt
Approach 3: work on the whole file
Some utilities easily let you read the whole file into memory and work on it.
<old.txt perl -0777 -pe '
s/^'\''Twas … toves\nDid … Bandersnatch!'\''$/new line 1\nnew line 2/mg
' >new.txt
Approach 4: program
Read the lines one by one. Start with an empty buffer. If you see the “'Twas” line and the buffer is empty, put it in the buffer. If you see the “Did gyre” and there's one line in the buffer, append the current line to the buffer, and so on. If you've just appended the “Bandersnatch line”, output the replacement text. If the current line didn't go into the buffer, print the buffer contents, print the current line and empty the buffer.
psusi shows a sed implementation. In sed, the buffer concept is built-in; it's called the hold space. In awk or perl, you'd just use a variable (perhaps two, one for the buffer contents and one for the number of lines).
I was sure there had to be a way to do this with sed. After some googling I came across this:
http://austinmatzko.com/2008/04/26/sed-multi-line-search-and-replace/
Based on that I ended up writing:
sed -n '1h;1!H;${;g;s/foo\nbar/jar\nhead/g;p;}' < x
Which correctly took the contents of x:
foo bar
And spit out:
jar head