Execute command on all files in a directory
Could somebody please provide the code to do the following: Assume there is a directory of files, all of which need to be run through a program. The program outputs the results to standard out. I need a script that will go into a directory, execute the command on each file, and concat the output into one big output file.
For instance, to run the command on 1 file:
$ cmd [option] [filename] > results.out
The following bash code will pass $file to command where $file will represent every file in /dir
for file in /dir/*
do
cmd [option] "$file" >> results.out
done
Example
el@defiant ~/foo $ touch foo.txt bar.txt baz.txt
el@defiant ~/foo $ for i in *.txt; do echo "hello $i"; done
hello bar.txt
hello baz.txt
hello foo.txt
How about this:
find /some/directory -maxdepth 1 -type f -exec cmd option {} \; > results.out
-
-maxdepth 1
argument prevents find from recursively descending into any subdirectories. (If you want such nested directories to get processed, you can omit this.) -
-type -f
specifies that only plain files will be processed. -
-exec cmd option {}
tells it to runcmd
with the specifiedoption
for each file found, with the filename substituted for{}
-
\;
denotes the end of the command. - Finally, the output from all the individual
cmd
executions is redirected toresults.out
However, if you care about the order in which the files are processed, you
might be better off writing a loop. I think find
processes the files
in inode order (though I could be wrong about that), which may not be what
you want.
I'm doing this on my raspberry pi from the command line by running:
for i in *;do cmd "$i";done
The accepted/high-voted answers are great, but they are lacking a few nitty-gritty details. This post covers the cases on how to better handle when the shell path-name expansion (glob) fails, when filenames contain embedded newlines/dash symbols and moving the command output re-direction out of the for-loop when writing the results to a file.
When running the shell glob expansion using *
there is a possibility for the expansion to fail if there are no files present in the directory and an un-expanded glob string will be passed to the command to be run, which could have undesirable results. The bash
shell provides an extended shell option for this using nullglob
. So the loop basically becomes as follows inside the directory containing your files
shopt -s nullglob
for file in ./*; do
cmdToRun [option] -- "$file"
done
This lets you safely exit the for loop when the expression ./*
doesn't return any files (if the directory is empty)
or in a POSIX compliant way (nullglob
is bash
specific)
for file in ./*; do
[ -f "$file" ] || continue
cmdToRun [option] -- "$file"
done
This lets you go inside the loop when the expression fails for once and the condition [ -f "$file" ]
check if the un-expanded string ./*
is a valid filename in that directory, which wouldn't be. So on this condition failure, using continue
we resume back to the for
loop which won't run subsequently.
Also note the usage of --
just before passing the file name argument. This is needed because as noted previously, the shell filenames can contain dashes anywhere in the filename. Some of the shell commands interpret that and treat them as a command option when the name are not quoted properly and executes the command thinking if the flag is provided.
The --
signals the end of command line options in that case which means, the command shouldn't parse any strings beyond this point as command flags but only as filenames.
Double-quoting the filenames properly solves the cases when the names contain glob characters or white-spaces. But *nix filenames can also contain newlines in them. So we de-limit filenames with the only character that cannot be part of a valid filename - the null byte (\0
). Since bash
internally uses C
style strings in which the null bytes are used to indicate the end of string, it is the right candidate for this.
So using the printf
option of shell to delimit files with this NULL byte using the -d
option of read
command, we can do below
( shopt -s nullglob; printf '%s\0' ./* ) | while read -rd '' file; do
cmdToRun [option] -- "$file"
done
The nullglob
and the printf
are wrapped around (..)
which means they are basically run in a sub-shell (child shell), because to avoid the nullglob
option to reflect on the parent shell, once the command exits. The -d ''
option of read
command is not POSIX compliant, so needs a bash
shell for this to be done. Using find
command this can be done as
while IFS= read -r -d '' file; do
cmdToRun [option] -- "$file"
done < <(find -maxdepth 1 -type f -print0)
For find
implementations that don't support -print0
(other than the GNU and the FreeBSD implementations), this can be emulated using printf
find . -maxdepth 1 -type f -exec printf '%s\0' {} \; | xargs -0 cmdToRun [option] --
Another important fix is to move the re-direction out of the for-loop to reduce a high number of file I/O. When used inside the loop, the shell has to execute system-calls twice for each iteration of the for-loop, once for opening and once for closing the file descriptor associated with the file. This will become a bottle-neck on your performance for running large iterations. Recommended suggestion would be to move it outside the loop.
Extending the above code with this fixes, you could do
( shopt -s nullglob; printf '%s\0' ./* ) | while read -rd '' file; do
cmdToRun [option] -- "$file"
done > results.out
which will basically put the contents of your command for each iteration of your file input to stdout and when the loop ends, open the target file once for writing the contents of the stdout and saving it. The equivalent find
version of the same would be
while IFS= read -r -d '' file; do
cmdToRun [option] -- "$file"
done < <(find -maxdepth 1 -type f -print0) > results.out