Why does redirecting sed output to same input file make my machine unresponsive?
I was trying sed
to replace some keywords in a large file (100 MB). I was unaware of the -i
(inplace) option, so my first attempt was to redirect like this:
sed 's/original/edited/g' file.log >> file.log
what happened after that was that my PC went to a halt, almost no keyboard input. I tried a different console Ctrl + Alt + F1 but after slowly entering user name, it halted too. Without keyboard, my only option was to hardware-reset the machine. After logging in, I saw that file.log was about 8 GB.
I really would like to understand why the execution of that command was able to make the system so unresponsive, and if mechanisms exist at the system level to trigger alerts and kill the offending process?
Your sed
command was trying to read the file it was appending to. It will never reach End-Of-File, but will eat a lot of CPU time trying. That's why ^C (interrupt current process) was invented.
Appending back to the file you read from is in no case a good idea, as you will end up with an ever growing file. If you really want to write back into the file you should use the -i
flag:
sed -i 's/original/edited/g' file.log
or if you want it to create a backup before doing changes you can add a file suffix to the -i
flag:
sed -i.bak 's/original/edited/g' file.log
This would create a file called file.log.bak
and then doing changes, what you did there by trying to append to the file you're reading from we call in programmer slang a data race, where different processes race for the same data source be it input or output. This is also why your machine came to a halt.
As has already been said, >>
appends to the file, so your sed
command will sit there reading the lines it has just output, and then outputting them some more. If you wanted to replace your file in-place, >
still wouldn't work, but you're aware of sed
's -i
option, which is definitely the one you want.
If, however, you're absolutely sure that you want to append to a file you're reading as a stream, and only want to do one pass of this, consider using sponge
from the moreutils
package;
sed 's/original/edited/g' file.log | sponge >> file.log
sponge
reads from stdin into memory until EOF, then dumps all its contents to stdout, so sed
will hit the end of the file, stop reading it, close it, and then sponge will start appending to it.