What is the difference between find with -exec and xargs?
trying to learn Bash scripting I want to execute some command on all files below my current directory that satisfy a certain condition. Using
find -name *.flac
Specifically I want to convert .flac
to .mp3
. I can find all the files. However I do not see the difference in executing a command using either the option -exec
for find
and using xargs
. E.g.
find -name *.flac | xargs -i ffmpeg -i {} {}.mp3
compared to
find -name *.flac -exec ffmpeg -i {} {}.mp3 \;
Can someone point out the difference? What is better praticice? What are the advantages/ disadvantages?
Also: If I wanted to simultaneously delete the original file, how would I add a second command in the above code?
Summary:
Unless you are much more familiar with xargs
than -exec
, you will probably want to use -exec
when you use find
.
Since xargs
is a separate program, calling it is likely to be marginally less efficient than using -exec
, which is a feature of the find
program. We don't usually want to call an extra program if it doesn't provide any additional benefit in terms of reliability, performance or readability. Since find ... -exec ...
provides the ability to run commands with an argument list (as xargs
does) if possible, there is not really any advantage of using xargs
with find
over -exec
. In the case of ffmpeg
, we have to specify input and output files, so we can't make performance gains using either method to construct an argument list, and with xargs
removing the illogical original filename extension is more difficult.
What xargs
does
Note: The verbose flag (which prints the constructed command with its arguments) in xargs
is -t
, and the interactive flag (which causes the user to be prompted for confirmation to operate on each argument) is -p
. You may find both of these useful for understanding and testing its behaviour.
xargs
attempts to turn its STDIN (typically the STDOUT of the previous command that has been piped to it) into a list of arguments to some command.
command1 | xargs command2 [output of command1 will be appended here]
Since STDOUT or STDIN is just a stream of text (this is also why you shouldn't parse the output of ls
), xargs
is easily tripped up. It reads arguments as being delimited by spaces or newlines. Filenames are allowed to contain spaces and may even contain newlines, and such filenames will cause unexpected behaviour. Let's say you have a file called foo bar
. When a list containing this filename is piped to xargs
, it attempts to run the given command on foo
and on bar
.
The same problem occurs when you type command foo bar
, and you know you can avoid it by quoting the space or the whole name, eg command foo\ bar
or command "foo bar"
, but even if we are able to quote the list passed to xargs
we don't usually want to, because we don't want the whole list to be treated as a single argument. The standard solution to this is to use the null character as delimiter, since filenames cannot contain it:
find path test(s) -print0 | xargs -0 command
This causes find
to append the null character to each filename instead of a space, and xargs
to treat only the null character as delimiter.
Problems may still occur if the command doesn't accept multiple arguments or if the argument list is extremely long.
In this case you are using ffmpeg
, which expects input files to be specified first, and output files to be specified last. We can tell ffmpeg
which files(s) to use as input explicitly with the -i
flag, but we need to give the output filename (from which the format is usually guessed, though we can also specify it) too. So, to construct suitable commands, you need to use the replace string option (-I
or -i
) of xargs
to specify both the input and output files:
... | xargs -I{} command {} {}.out
(the documentation says that -i
is deprecated for this purpose and we should use -I
instead, but I am not sure why. When using -I
, you must specify the replacement ({}
is normally used) immediately after the option. With -i
you can omit to specify the replacement, but {}
is understood by default.)
The -I
option causes the command list to be split only on newlines, not spaces, so if you are sure your filenames will not contain newlines, you do not have to use -print0 | xargs -0
when you use -I
. If you are uncertain, you can still use the safer syntax:
find -name "*.flac" -print0 | xargs -0I{} ffmpeg -i {} {}.mp3
However, the performance benefit of xargs
(which enables us to run a command once with a list of arguments) is lost here, since ffmpeg
must be run once for each pair of input and output files (you can see this easily by prepending echo
to ffmpeg
to test the above command). This also produces an illogical filename and doesn't allow you to run multiple commands. To do the latter, you can call bash
, as in dessert's answer:
... | xargs -I{} bash -c 'ffmpeg -i {} {}.mp3 && rm {}'
but renaming is tricky.
How -exec
is different
When you use the -exec
option to find
, the found files are passed as arguments to the command after -exec
. They aren't turned into text. With the syntax:
find ... -exec command {} \;
command
is run once for each file found. With the syntax
find ... -exec command {} +
an argument list is constructed from the found files so that we can run the command only once (or only as many times as required) on multiple files, giving the performance benefit provided by xargs
. However, since the filename arguments aren't constructed from a stream of text, using -exec
doesn't have the problem xargs
has of breaking on spaces and other special characters.
With ffmpeg
, we can't use +
for the same reason as xargs
didn't give any performance benefit; since we need to specify both input and output, the command must be run on each file individually. We have to use some form of
find -name "*.flac" -exec ffmpeg -i {} {}.out \;
This, again, will give you a rather illogically named file, as dessert's answer explains, so you may want to strip it, as dessert's answer explains how to do with string manipulation (not easily done in xargs
; another reason to use -exec
). It also explains how to run multiple commands on the file so that you can safely remove the original file after a successful conversion.
Instead of repeating dessert's recommendation, which I agree with, I will suggest an alternative to find
, which allows similar flexibility to running bash -c
after -exec
; a bash for
loop:
shopt -s globstar # allow recursive globbing with **
for f in ./**/*.flac; do # for all files ending with .flac
# convert them, stripping the original extension from the new filename
echo ffmpeg -i "$f" "${f%.flac}.mp3" &&
echo rm -v "$f" # if that succeeded, delete the original
done
shopt -u globstar # turn recursive globbing off
Remove the echo
es after testing to actually operate on the files.
ffmpeg
doesn't recognise --
to mark the end of options, so to avoid filenames beginning with -
being interpreted as options, we use ./
to indicate the current directory instead of starting with **
, so that all paths begin with ./
instead of arbitrary filenames. This means we don't need to use --
with rm
(which does recognise it) either.
Note: you should quote your -name
test expression if it contains any wildcard characters, otherwise the shell will expand them if possible (ie if they match any files in the current directory) before they are passed to find
, so in the first place, use
find -name "*.flac"
to prevent unexpected behaviour.
Generally one tries to call as few commands as possible, but in your case I think its a matter of taste – I'd go with -exec
, using it like so:
find . -name '*.flac' -exec bash -c 'ffmpeg -i "$0" "${0%flac}mp3" && rm "$0"' {} \;
The trick is to call bash
with the -c
option, this way you can not only execute multiple commands but also use Bash Parameter Substitution to remove the flac
ending from your filenames – I suppose you don't really want to end up with files named filename.flac.mp3, do you?
Explanations
-
bash -c '…' {}
– run the command(s)…
inbash
with the filename as the first argument (accessible with$0
) -
${0%flac}
– stripflac
from the end of the filename -
&& rm "$0"
– only if the preceding command succeeded, remove the original file
As Zanna and dessert already answered -exec
should be preferred when xargs
is not necessary ("We don't usually want to call an extra program if it doesn't provide any additional benefit in terms of reliability, performance or readability.")
While that is totally correct I want to add that xargs
in combination with the -P
flag can provide a substantial benefit in terms of performance.
xargs
will spawn the processes in parallel enabling multi-threading, similar to but more flexible than the parallel
command.
-P max-procs, --max-procs=max-procs
Run up to max-procs processes at a time; the default is 1. If max-procs is 0, xargs will run as many processes as possible at a time. Use the -n option or the -L option with -P; other‐
wise chances are that only one exec will be done.
[...]
This especially helps with with processes that do not run multi-threaded by themselves. In your case ffmpeg
will take care about multithreading, so it won't help or will even have a negative effect on performance.
find . -name "*.ext" -print0 | xargs -0 -i -P 20 command -in {} -out {}.out