What does this strange symbol ":>" in bash mean

I found something in a script, but not belonging to the main script. There was :> in a line.

Could you explain to me what it means?

:> file
while read A B C D E; do echo "$A;$B;$D;$E;$C" >> file; done < otherfile

There was :> in a line of a bash script. What does it mean?

:> file

It is a short cut way of saying:

If file does not exist then create it else truncate it to 0 bytes.

This means you can be sure that file exists and it is empty.

You can also use > file but :> file is more portable.

See the Stack Overflow question What Is the Purpose of the ':' (colon) GNU Bash Builtin? for more information.

It looks like a fancy way of creating a new file. In bash : is a null command:

$ type : 
: is a shell builtin 
$ help : 
:: :
    Null command.

    No effect; the command does nothing.

    Exit Status:
    Always succeeds.

> redirects output of : to a file.

: is another name for true. Both are shell builtins in bash, but there's no /bin/:, only a /bin/true. Output redirection causes the shell to open(2) the file with O_CREAT|O_TRUNC. If nothing is written, it stays at zero length.

Putting those two pieces together, :> file is a fairly common idiom for truncating files. Most people would try to make it less weird-looking by writing : >file, though.

Since you asked in a comment about the 2nd line, I'll turn my comments into an answer. (even though you didn't ask this in your question.)

The 2nd line is a loop that reads lines from otherfile into some named variables. The loop body uses echo to print them with ; separators instead of whatever whitespace they had before. file is closed and re-opened (for append) each iteration, because the redirect is inside the loop. Using while ...;do read -r ...;done <otherfile >file would suck less, and avoid the need to truncate file first. read -r doesn't eat \ as an escape character.

Text processing in bash is quite slow. Part of that is unavoidable: read has to go one byte at a time (one read(2) system call per byte) to avoid overshooting the end of a line. It would be better to use the right tool for the job:

awk -vOFS=';' '{ print $1, $2, $4, $5, $3 }' -- otherfile  >file

-- means your script doesn't break if otherfile is named something silly like --version.

Setting the Output Field Separator to ; means you can just pass multiple fields as args to print. Shell read assigns the whole rest of the line with whitespace to the last variable, but there's no way to tell awk to only split into 5. If that's important, maybe just keep using a bash loop, because it's inconvenient in awk. Perl makes this easy, since its split can take a max-fields arg, but it's a lot slower to start up than awk.

Actually, it turned out to be not that hard, just an ugly regex to write. To get rest-of-the-line instead of $5 in awk, looping over fields still loses their original whitespace. My first viable idea is to use gensub on $0 (the whole line) to remove the first 4 fields (i.e. non-space followed by space), leaving everything else:

awk -vOFS=';' '{ tail = gensub("[[:space:]]*([^[:space:]]+[[:space:]]+){4}", "", 1); print $1, $2, $4, tail, $3 }' -- otherfile >file

I got it right on the first try, but the fact that I was impressed with myself for that says something about the readability of that awk code. >.<

Note how it's the same print as before, but with tail in place of $5.

echo 'A  B c DD    e      f g    f' | 
  awk -vOFS=\; '{ tail = gensub("[[:space:]]*([^[:space:]]+[[:space:]]+){4}", "", 1);
   print $1, $2, $4, tail, $3 }'

A;B;DD;e       f g    f;c

This would be more impressive if I could copy/paste the literal and show that it came through in the output. Type one in bash with ^Q. ctrl-Q means Quote the next keypress as a literal character, since bash's emacs-style line editing is the same as actual emacs for this.

http://mywiki.wooledge.org/BashFAQ has some useful stuff about scripting in ways that won't break no matter what data or filenames you throw at the script.

What does this strange symbol ":>" in bash mean

There was :> in a line of a bash script. What does it mean?

Related

Recent Posts