Reading and writing a file: tee command
It is well known that a command like this:
cat filename | some_sed_command >filename
erases file filename, as the output redirection, being executed before the command, causes filename to be truncated.
One could solve the issue in the following way:
cat file | some_sed_command | tee file >/dev/null
but I'm not sure this would work in any case: what happens if file (and the result of the sed command) is very big? How can the operating system avoid to overwrite some content which is still not read? I see that there is also a sponge command which should work in any case: is it "safer" than tee?
One could solve the issue in the following way:
cat file | some_sed_command | tee file >/dev/null
No.
The chances file
will be truncated drop, but there's no guarantee cat file | some_sed_command | tee file >/dev/null
will not truncate file
.
It all depends on which command is processed first, as as opposed to what one may expect, commands in a pipe are not processed left-to-right. There's no guarantee about which command will be picked first, so one might as well just think of it as randomly picked and never rely on the shell not picking the offending one.
Since the chances for the offending command to be picked first in between three commands are lower than the chances for the offending command to be picked first in between two commands, it's less likely that file
will be truncated, but it's still going to happen.
script.sh
:
#!/bin/bash
for ((i=0; i<100; i++)); do
cat >file <<-EOF
foo
bar
EOF
cat file |
sed 's/bar/baz/' |
tee file >/dev/null
[ -s file ] &&
echo 'Not truncated' ||
echo 'Truncated'
done |
sort |
uniq -c
rm file
% bash script.sh
93 Not truncated
7 Truncated
% bash script.sh
98 Not truncated
2 Truncated
% bash script.sh
100 Not truncated
So never use something like cat file | some_sed_command | tee file >/dev/null
. Use sponge
as Oli suggested.
As an alternative, for thighter environments and / or relatively small files one may use a here string and a command substitution to read the file before any command is run:
$ cat file
foo
bar
$ for ((i=0; i<100; i++)); do <<<"$(<file)" sed 's/bar/baz/' >file; done
$ cat file
foo
baz
For sed
specifically, you can use its -i
in-place argument. It just saves back to the file it opened, eg:
sed -i 's/ /-/g' filename
If you want to do something beefier, assuming you're doing more than sed
, yes, you can buffer the whole thing with sponge
(from the moreutils
package) which will "soak up" all the stdin before writing out to the file. It's like tee
but with less functionality. For basic usage though, it's pretty much a drop-in replacement:
cat file | some_sed_command | sponge file >/dev/null
Is that safer? Definitely. It probably has limits through so if you're doing something colossal (and can't in-place edit with sed), you might want to do your edits to a second file and then mv
that file back to the original filename. That should be atomic (so anything depending on these files won't break if they need constant access).