Is redirection with `>>` equivalent to `>` when target file doesn't yet exist?

Consider a shell like Bash or sh. The basic difference between > and >> manifests itself in a case when the target file exists:

  • > truncates the file to zero size, then writes;
  • >> doesn't truncate, it writes (appends) to the end of the file.

If the file does not exist it is created with zero size; then written to. This is true for both operators. It may seem the operators are equivalent when the target file doesn't yet exist.

Are they really?


Solution 1:

tl;dr

No. >> is essentially "always seek to end of file" while > maintains a pointer to the last written location.


Full answer

(Note: all my tests done on Debian GNU/Linux 9).

Another difference

No, they are not equivalent. There is another difference. It may manifest itself regardless of whether the target file existed before or not.

To observe it, run a process that generates data and redirect to a file with > or >> (e.g. pv -L 10k /dev/urandom > blob). Let it run and change the size of the file (e.g. with truncate). You will see that > keeps its (growing) offset while >> always appends to the end.

  • If you truncate the file to a smaller size (it can be zero size)
    • > won't care, it will write at its desired offset as if nothing happened; just after the truncating the offset is beyond the end of the file, this will cause the file to regain its old size and grow further, missing data will be filled with zeros (in a sparse way, if possible);
    • >> will append to the new end, the file will grow from its truncated size.
  • If you enlarge the file
    • > won't care, it will write at its desired offset as if nothing happened; just after changing the size the offset is somewhere inside the file, this will cause the file to stop growing for a while, until the offset reaches the new end, then the file will grow normally;
    • >> will append to the new end, the file will grow from its enlarged size.

Another example is to append (with a separate >>) something extra when the data generating process is running and writing to the file. This is similar to enlarging the file.

  • The generating process with > will write at its desired offset and overwrite the extra data eventually.
  • The generating process with >> will skip the new data and append past it (race condition may occur, the two streams may get interleaved, still no data should be overwritten).

Example

Does it matter in practice? There is this question:

I'm running a process which produces a lot of output on stdout. Sending it all to a file [...] Can I use some kind of log rotation program?

This answer says the solution is logrotate with copytruncate option which acts like this:

Truncate the original log file in place after creating a copy, instead of moving the old log file and optionally creating a new one.

According to what I wrote above, redirecting with > will make the truncated log large in no time. Sparseness will save the day, no significant disk space should be wasted. Nevertheless each consecutive log will have more and more leading zeros in it that are completely unnecessary.

But if logrotate creates copies without preserving sparseness, these leading zeros will need more and more disk space every time a copy is made. I haven't investigated the tool behavior, it may be smart enough with sparseness or compression on the fly (if compression is enabled). Still the zeros may only cause trouble or be neutral at best; nothing good in them.

In this case using >> instead of > is significantly better, even if the target file is about to be created yet.


Performance

As we can see, the two operators act differently not only when they begin but also later. This may cause some (subtle?) performance difference. For now I have no meaningful test results to support or disprove it, but I think you shouldn't automatically assume their performance is the same in general.