How to bulk-rename files with invalid encoding or bulk-replace invalid encoded characters?
You're going to run in some problems if you want to rename files and directories at the same time. Renaming just a file is easy enough. But you want to make sure the directories are also renamed. You can't simply mv Motörhead/Encöding Motorhead/Encoding
since Motorhead
won't exist at the time of the call.
So, we need a depth-first traversal of all files and folders, and then rename the current file or folder only. The following works with GNU find
and Bash 4.2.42 on my OS X.
#!/usr/bin/env bash
find "$1" -depth -print0 | while IFS= read -r -d '' file; do
d="$( dirname "$file" )"
f="$( basename "$file" )"
new="${f//[^a-zA-Z0-9\/\._\-]/}"
if [ "$f" != "$new" ] # if equal, name is already clean, so leave alone
then
if [ -e "$d/$new" ]
then
echo "Notice: \"$new\" and \"$f\" both exist in "$d":"
ls -ld "$d/$new" "$d/$f"
else
echo mv "$file" "$d/$new" # remove "echo" to actually rename things
fi
fi
done
You may change the regex by using new="${f//[\\\/\:\*\?\"<>|]/}"
if you want to replace anything that Windows cannot handle.
Save this script as rename.sh
, make it executable with chmod +x rename.sh
. Then, call it like rename.sh /some/path
.
Make sure to resolve any file name collisions (“Notice
” announcements).
If you're absolutely sure it does the right replacements, remove the echo
from the script to actually rename things instead of just printing what it does.
To be safe, I'd recommend testing this on a small subset of files first.
Options explained
To explain what goes on here:
-
-depth
will ensure directories are recursed depth-first, so we can "roll up" everything from the end. Usually,find
traverses differently (but not breadth-first). -
-print0
ensures thefind
output is null-delimited, so we can read it withread -d ''
into thefile
variable. Doing so helps us deal with all kinds of weird file names, including ones with spaces, and even newlines. - We'll get the directory of the file with
dirname
. Don't forget to always quote your variables properly, otherwise any path with spaces or globbing characters would break this script. - We'll get the actual filename (or directory name) with
basename
. - Then, we remove any invalid character from
$f
using Bash's string replacement capabilities. Invalid means anything that's not a lower- or uppercase letter, a digit, a slash (\/
), a dot (\.
), an underscore, or a minus-hyphen. - If
$f
is already clean (the cleaned name is identical to the current name), skip it. - If
$new
already exists in directory$d
(e.g., you have files namedresume
andrésumé
in the same directory), issue a warning. You don't want to rename it, because, on some systems,mv foo foo
causes a problem. Otherwise, - We finally rename the original file (or directory) to its new name
Since this will only act on the deepest hierarchy, renaming Motörhead/Encöding
to Motorhead/Encoding
is done in two steps:
mv Motörhead/Encöding Motörhead/Encoding
mv Motörhead Motorhead
This ensures all replacements are done in the correct order.
Example files and test run
Let's assume some files in a base folder called test
:
test
test/Motörhead
test/Motörhead/anöther_file.mp3
test/Motörhead/Encöding
test/Randöm
test/Täst
test/Täst/Töst
test/with space
test/with-hyphen.txt
test/work
test/work/resume
test/work/résumé
test/work/schedule
Here is the output from a run in debug mode (with the echo
in front of the mv
),
i.e., the commands that would be called, and the collision warnings:
mv test/Motörhead/anöther_file.mp3 test/Motörhead/another_file.mp3
mv test/Motörhead/Encöding test/Motörhead/Encoding
mv test/Motörhead test/Motorhead
mv test/Randöm test/Random
mv test/Täst/Töst test/Täst/Tost
mv test/Täst test/Tast
mv test/with space test/withspace
Notice: "resume" and "résumé" both exist in test/work:
-rw-r—r-- … … test/work/resume
-rw-r—r-- … … test/work/résumé
Notice the absence of messages for with-hyphen.txt
, schedule
, and test
itself.
I know that it's not exactly what you wanted, but if you know the original encoding, perhaps you can use convmv
to change the encoding to UTF-8, which should fix most problems.
This worked for me on a folder with some invalid-encoded Polish filenames:
convmv -f cp1250 -t utf8 -r .
Note that this command doesn't actually rename anything; add --notest
option to really rename the files.