What is the general consensus on "Useless use of cat"?

When I pipe multiple unix commands such as grep, sed, tr etc. I tend to specify the input file that is being processed using cat. So something like cat file | grep ... | awk ... | sed ... .

But recently after a couple of comments left on my answers indicating that this was a useless use of cat, I thought I would ask the question here.

I looked up the issue and came across Wikipedia's article on UUOC and The Useless Use of Cat Award and it seems to me that the arguments made are from the perspective of efficiency.

The closest question I came across here was this one: Is it wasteful to call cat? – but it's not quite what I'm asking.

I guess what the UUOC camp suggest is to use cmd1 args < file | cmd2 args | cmd3 .. or if the command has an option to read from file then to pass in the file as an argument.

But to me cat file | cmd1 ... | cmd2 seems much easier to read and understand. I don't have to remember different ways of sending input files to different commands, and the process flows logically from left to right. First input, then the first process ... and so on.

Am I failing to understand what arguments are being made about the useless use of cat? I understand that if I'm running a cron job that runs every 2 seconds that does a lot of processing, then in that case cat might be wasteful. But otherwise what's the general consensus on the use of cat?


Solution 1:

It's useless in the sense that using it like that doesn't accomplish anything the other, possibly more efficient options can't (i.e. producing proper results).

But cat is way more powerful than just cat somefile. Consult man cat or read what I wrote in this answer. But if you absolutely positively only need the contents of a single file, you might get some performance advantage from not using cat to get at the file contents.

Regarding readability, this depends on your personal tastes. I like cating files into other commands for the same reason, especially if the performance aspects are negligible.

It also depends on what you're scripting. If it's your own shell and convenience methods for your desktop machine, nobody except you will care. If you stumble upon a case where the next tool in the chain would be better off being able to seek, and distribute this as a frequently used piece of software on some minimal Linux system on a low-performance router or similar device with real limits on processing ability, that's different. It always depends on the context.

Solution 2:

I often use cat file | myprogram in examples. Sometimes I am being accused of Useless use of cat (http://www.iki.fi/era/unix/award.html). I disagree for the following reasons:

It is easy to understand what is going on.

When reading a UNIX command you expect a command followed by arguments followed by redirection. It is possible to put the redirection anywhere but it is rarely seen - thus people will have a harder time reading the example. I believe

    cat foo | program1 -o option -b option | program2

is easier to read than

    program1 -o option -b option < foo | program2

If you move the redirection to the start you are confusing people who are not used to this syntax:

    < foo program1 -o option -b option | program2

and examples should be easy to understand.

It is easy to change.

If you know the program can read from cat, you can normally assume it can read the output from any program that outputs to STDOUT, and thus you can adapt it for your own needs and get predictable results.

It stresses that the program does not fail, if STDIN is not a regular file.

It is not safe to assume that if program1 < foo works then cat foo | program1 will also work. However, it is in practice safe to assume the opposite. This program works if STDIN is a file, but fails if the input is a pipe, because it uses seek:

    # works
    < foo perl -e 'seek(STDIN,1,1) || die;print <STDIN>'

    # fails
    cat foo | perl -e 'seek(STDIN,1,1) || die;print <STDIN>'

Performance penalty is often not measurable.

I have looked at the performance penalty on http://oletange.blogspot.dk/2013/10/useless-use-of-cat.html The conclusion is don't use cat file | if the complexity of the processing is similar to a simple grep and performance matters more than readability. For other situations cat file | is fine.

Here is an example where | cat increases performance by 50%: https://unix.stackexchange.com/questions/614154/useless-use-of-cat-increases-performance-why

Solution 3:

In every day command line use it's not really much different. You especially aren't going to notice any speed difference since the time on CPU avoided by not using cat, your CPU is just going to be idle. Even if you're looping through hundreds or thousands (or even hundreds of thousands) of items in all practicality it's not going to make much difference, unless you're on a very loaded system (Load Average / N CPU > 1).

The where the rubber meets the road is about forming good habits and discouraging bad ones. To drag out a moldy cliché, the devil is in the details. And it's details like this that separate the mediocre from the great.

It's like while driving a car, why make a left turn when you can just make three rights instead? Of course you can, and it works perfectly. But if you understood the power of left turns then three rights just seems silly.

It's not about saving one file handle, 17k of RAM and 0.004 seconds of CPU time. It's about the entire philosophy of using UNIX. The "power of left turns" in my illustration isn't merely redirecting input, it's the UNIX philosophy. Fully grokking this will make you excel far better than those around you, and you will garner respect from those who do understand.