Reading and writing a file: tee command

It is well known that a command like this:

cat filename | some_sed_command >filename

erases file filename, as the output redirection, being executed before the command, causes filename to be truncated.

One could solve the issue in the following way:

cat file | some_sed_command | tee file >/dev/null

but I'm not sure this would work in any case: what happens if file (and the result of the sed command) is very big? How can the operating system avoid to overwrite some content which is still not read? I see that there is also a sponge command which should work in any case: is it "safer" than tee?


One could solve the issue in the following way:

cat file | some_sed_command | tee file >/dev/null

No.

The chances file will be truncated drop, but there's no guarantee cat file | some_sed_command | tee file >/dev/null will not truncate file.

It all depends on which command is processed first, as as opposed to what one may expect, commands in a pipe are not processed left-to-right. There's no guarantee about which command will be picked first, so one might as well just think of it as randomly picked and never rely on the shell not picking the offending one.

Since the chances for the offending command to be picked first in between three commands are lower than the chances for the offending command to be picked first in between two commands, it's less likely that file will be truncated, but it's still going to happen.

script.sh:

#!/bin/bash
for ((i=0; i<100; i++)); do
    cat >file <<-EOF
    foo
    bar
    EOF
    cat file |
        sed 's/bar/baz/' |
        tee file >/dev/null
    [ -s file ] &&
        echo 'Not truncated' ||
        echo 'Truncated'
done |
    sort |
    uniq -c
rm file
% bash script.sh
 93 Not truncated
  7 Truncated
% bash script.sh
 98 Not truncated
  2 Truncated
% bash script.sh
100 Not truncated

So never use something like cat file | some_sed_command | tee file >/dev/null. Use sponge as Oli suggested.

As an alternative, for thighter environments and / or relatively small files one may use a here string and a command substitution to read the file before any command is run:

$ cat file
foo
bar
$ for ((i=0; i<100; i++)); do <<<"$(<file)" sed 's/bar/baz/' >file; done
$ cat file
foo
baz

For sed specifically, you can use its -i in-place argument. It just saves back to the file it opened, eg:

sed -i 's/ /-/g' filename

If you want to do something beefier, assuming you're doing more than sed, yes, you can buffer the whole thing with sponge (from the moreutils package) which will "soak up" all the stdin before writing out to the file. It's like tee but with less functionality. For basic usage though, it's pretty much a drop-in replacement:

cat file | some_sed_command | sponge file >/dev/null

Is that safer? Definitely. It probably has limits through so if you're doing something colossal (and can't in-place edit with sed), you might want to do your edits to a second file and then mv that file back to the original filename. That should be atomic (so anything depending on these files won't break if they need constant access).