Renaming large number of image files with bash

I need to rename approx. 70,000 files. For example: From sb_606_HBO_DPM_0089000 to sb_606_dpm_0089000 etc.

The number range goes from 0089000 to 0163022. It's only the first part of the name that needs to change. all the files are in a single directory, and are numbered sequentially (an image sequence). The numbers must remain unchanged.

When I try this in bash it grizzles at me that the 'Argument list is too long'.

Edit:

I first tried renaming a single file with mv:

mv sb_606_HBO_DPM_0089000.dpx sb_606_dpm_0089000.dpx

Then I tried renaming a range (I learned here last week how to move a load of files, so I thought the same syntax might work for renaming the files...). I think I tried the following (or something like it):

mv sb_606_HBO_DPM_0{089000..163023}.dpx sb_606_dpm_0{089000..163023}.dpx

One way is to use find with -exec, and the + option. This constructs an argument list, but breaks the list into as many calls as needed to operate on all the files without exceeding the maximum argument list. It is suitable when all arguments will be treated the same. This is the case with rename, though not with mv.

You may need to install Perl rename:

sudo apt install rename

Then you can use, for example:

find . -maxdepth 1 -exec rename -n 's/_HBO_DPM_/_dpm_/' {} +

Remove -n after testing, to actually rename the files.


I'm going to suggest three alternatives. Each is a simple single line command, but I'll provide variants for more complicated cases, mainly in case the files to process are mixed with other files in the same directrory.

mmv

I'd use the mmv command from the package of the same name:

mmv '*HBO_DPM*' '#1dpm#2'

Note that the arguments are passed as strings, so the glob expansion does not happen in the shell. The command receives exactly two arguments, and then finds corresponding files internally, without tight limits on the number of files. Also note that the command above assumes that all the files which match the first glob shall be renamed. Of course you are free to be more specific:

mmv 'sb_606_HBO_DPM_*' 'sb_606_dpm_#1'

If you have files outside the requested number range in the same directory, you might be better off with the loop over numbers given further down in this answer. However you could also use a sequence of mmv invocations with suitable patterns:

mmv 'sb_606_HBO_DPM_0089*'       'sb_606_dpm_0089#1'    # 0089000-0089999
mmv 'sb_606_HBO_DPM_009*'        'sb_606_dpm_009#1'     # 0090000-0099999
mmv 'sb_606_HBO_DPM_01[0-5]*'    'sb_606_dpm_01#1#2'    # 0100000-0159999
mmv 'sb_606_HBO_DPM_016[0-2]*'   'sb_606_dpm_016#1#2'   # 0160000-0162999
mmv 'sb_606_HBO_DPM_01630[01]?'  'sb_606_dpm_01630#1#2' # 0163000-0163019
mmv 'sb_606_HBO_DPM_016302[0-2]' 'sb_606_dpm_016302#1'  # 0163020-0163022

loop over numbers

If you want to avoid installing anything, or need to select by number range avoiding matches outside this range, and you are prepared to wait for 74,023 command invocations, you could use a plain bash loop:

for i in {0089000..0163022}; do mv sb_606_HBO_DPM_$i sb_606_dpm_$i; done

This works particularly well here since there are no gaps in the sequence. Otherwise you might want to check whether the source file actually exists.

for i in {0089000..0163022}; do
  test -e sb_606_HBO_DPM_$i && mv sb_606_HBO_DPM_$i sb_606_dpm_$i
done

Note that in contrast to for ((i=89000; i<=163022; ++i)) the brace expansion does handle leading zeros since some Bash release a couple of years ago. Actually a change I requested, so I'm happy to see use cases for it.

Further reading: Brace Expansion in the Bash info pages, particularly the part about {x..y[..incr]}.

loop over files

Another option would be to loop over a suitable glob, instead of just looping over the integer range in question. Something like this:

for i in *HBO_DPM*; do mv "$i" "${i/HBO_DPM/dpm}"; done

Again this is one mv invocation per file. And again the loop is over a long list of elements, but the whole list is not passed as an argument to a subprocess, but handled internally by bash, so the limit won't cause you problems.

Further reading: Shell Parameter Expansion in the Bash info pages, documenting ${parameter/pattern/string} among others.

If you wanted to restrict the number range to the one you provided, you could add a check for that:

for i in sb_606_HBO_DPM_+([0-9]); do
  if [[ "${i##*_*(0)}" -ge 89000 ]] && [[ "${i##*_*(0)}" -le 163022 ]]; then
    mv "$i" "${i/HBO_DPM/dpm}"
  fi
done

Here ${i##pattern} removes the longest prefix matching pattern from $i. That longest prefix is defined as anything, then an underscore, then zero or more zeros. The latter is written as *(0) which is an extended glob pattern that depends on the extglob option being set. Removing leading zeros is important to treat the number as base 10 not base 8. The +([0-9]) in the loop argument is another extended glob, matching one or more digits, just in case you have files there that start the same but don't end in a number.


One way to work around the ARG_MAX limit is to use the bash shell's builtin printf:

printf '%s\0' sb_* | xargs -0 rename -n 's/HBO_DPM/dpm/'

Ex.

rename -n 's/HBO_DPM/dpm/' sb_*
bash: /usr/bin/rename: Argument list too long

but

printf '%s\0' sb_* | xargs -0 rename -n 's/HBO_DPM/dpm/'
rename(sb_606_HBO_DPM_0089000, sb_606_dpm_0089000)
.
.
.
rename(sb_606_HBO_DPM_0163022, sb_606_dpm_0163022)

find . -type f -exec bash -c 'echo $1 ${1/HBO_DPM/dpm}' _ {} \;
./sb_606_HBO_DPM_0089000 ./sb_606_dpm_0089000

find in current directory . for all the files -type f and do rename the file found $1 with replacing HBO_DPM with dmp one by one -exec ... \;

replace echo with mv to perform rename.


You could write a little python script, something like:

import os
for file in os.listdir("."):
    os.rename(file, file.replace("HBO_DPM", "dpm"))

Save that as a text file as rename.py in the folder the files are in, then with the terminal in that folder go:

python rename.py