What does "xargs grep" do?
I know the grep
command and I am learning about the functionalities of xargs
, so I read through this page which gives some examples on how to use the xargs
command.
I am confused by the last example, example 10. It says "The xargs command executes the grep command to find all the files (among the files provided by find command) that contained a string ‘stdlib.h’"
$ find . -name '*.c' | xargs grep 'stdlib.h'
./tgsthreads.c:#include
./valgrind.c:#include
./direntry.c:#include
./xvirus.c:#include
./temp.c:#include
...
...
...
However, what is the difference to simply using
$ find . -name '*.c' | grep 'stdlib.h'
?
Obviously, I am still struggling with what exactly xargs is doing, so any help is appreciated!
Solution 1:
$ find . -name '*.c' | grep 'stdlib.h'
This pipes the output (stdout)* from find
to (stdin of)* grep 'stdlib.h'
as text (ie the filenames are treated as text). grep
does its usual thing and finds the matching lines in this text (any file names which themselves contain the pattern). The contents of the files are never read.
$ find . -name '*.c' | xargs grep 'stdlib.h'
This constructs a command grep 'stdlib.h'
to which each result from find
is an argument - so this will look for matches inside each file found by find
(xargs
can be thought of as turning its stdin into arguments to the given commands)*
Use -type f
in your find command, or you will get errors from grep
for matching directories. Also, if the filenames have spaces, xargs
will screw up badly, so use the null separator by adding -print0
and xargs -0
for more reliable results:
find . -type f -name '*.c' -print0 | xargs -0 grep 'stdlib.h'
*added these extra explanatory points as suggested in comment by @cat
Solution 2:
xargs takes its standard input and turns it into command line args.
find . -name '*.c' | xargs grep 'stdlib.h'
is very similar to
grep 'stdlib.h' $(find . -name '*.c') # UNSAFE, DON'T USE
And will give the same results as long as the list of filenames isn't too long for a single command line. (Linux supports megabytes of text on a single command line, so usually you don't need xargs.)
But both of these suck, because they break if your filenames contain spaces. Instead, find -print0 | xargs -0
works, but so does
find . -name '*.c' -exec grep 'stdlib.h' {} +
That never pipes the filenames anywhere: find
batches them up into a big command line and runs grep
directly.
\;
instead of +
runs grep separately for each file, which is much slower. Don't do that. But +
is a GNU extension, so you need xargs
to do this efficiently if you can't assume GNU find.
If you leave out xargs
, find | grep
does its pattern matching against the list of filenames that find
prints.
So at that point, you might as well just do find -name stdlib.h
. Of course, with -name '*.c' -name stdlib.h
, you won't get any output because those patterns can't both match, and find's default behaviour is to AND the rules together.
Substitute less
at any point in the process to see what output any part of the pipeline produces.
Further reading: http://mywiki.wooledge.org/BashFAQ has some great stuff.
Solution 3:
In general, xargs
is used for cases where you would pipe (with the symbol |
) something from one command to the other (Command1 | Command2
), but the output from the first command is not correctly received as the input for the second command.
This typically happens when the second command does not handle data input through Standard In (stdin) correctly (eg: Multiple lines as input, the way the lines are setup, the characters used as input, multiple parameters as input, the data type received as input, etc..). To give you a quick example, test the following:
Example 1:
ls | echo
- This will not do anything since echo
does not know how to handle the input he is receiving. Now in this case if we use xargs
it will process the input in a way that can be handled correctly by echo
(eg: As a single line of information)
ls | xargs echo
- This will output all the information from ls
in a single line
Example 2:
Let's say I have multiple goLang files inside a folder called go. I would look for them with something like this:
find go -name *.go -type f | echo
- But if the pipe symbol there and the echo
at the end, it would not work.
find go -name *.go -type f | xargs echo
- Here it would work thanks to xargs
but if I wanted each response from the find
command in a single line, I would do the following:
find go -name *.go -type f | xargs -0 echo
- In this case, the same output from find
would be shown by echo
.
Commands like cp, echo, rm, less
and others that need a better way to handle the input get a benefit when used with xargs
.
Solution 4:
xargs
is used to auto generate command line arguments based (usually) on a list of files.
So considering some alternatives to using the followoing xargs
command:
find . -name '*.c' -print0 | xargs -0 grep 'stdlib.h'
There are several reasons to use it instead of other options that weren't originally mentioned in other answers:
-
find . -name '*.c' -exec grep 'stdlib.h' {}\;
will generate onegrep
process for every file—this is generally considered bad practice, and may put a big load on the system if there are many files found. - If there are a lot of files, a
grep 'stdlib.h' $(find . -name '*.c')
command will likely fail, because the output of the$(...)
operation will exceed the maximum command line length of the shell
As mentioned in other answers, the reason for using the -print0
argument to find
in this scenario and the -0
argument to xargs, is so that filenames with certain characters (e.g. quotes, spaces or even newlines) are still handled correctly.