Character encoding problem with filenames - find broken filenames
Assuming you are using utf-8 encoding (the default in Ubuntu), this script should hopefully identify the filenames and rename them for you.
It works by using find with C-encoding (ASCII) to locate files with unprintable characters in them. It then tries to determine if these unprintable characters are utf-8 characters or not. If not, it shows you the filenames decoded with each of the encodings listed in the enc
array, allowing you to select the one that looks right in order to rename it.
latin1 was commonly used on older Linux systems, and windows-1252 is commonly used by windows nowadays (I think). iconv -l
will show you a list of possible encodings.
#!/bin/bash
# List of encodings to try. (max 10)
enc=( latin1 windows-1252 )
while IFS= read -rd '' file <&3; do
base=${file##*/} dir=${file%/*}
# if converting from utf8 to utf8 succeeds, we'll assume the filename is ok.
iconv -f utf8 <<< "$base" >/dev/null 2>&1 && continue
# display the filename converted from each enc to utf8
printf 'In %s:\n' "$dir/"
for i in "${!enc[@]}"; do
name=$(iconv -f "${enc[i]}" <<< "$base")
printf '%2d - %-12s: %s\n' "$i" "${enc[i]}" "$name"
done
printf ' s - Skip\n'
while true; do
read -p "? " -n1 ans
printf '\n'
if [[ $ans = [0-9] && ${enc[ans]} ]]; then
name=$(iconv -f "${enc[ans]}" <<< "$base")
mv -iv "$file" "$dir/$name"
break
elif [[ $ans = [Ss] ]]; then
break
fi
done
done 3< <(LC_ALL=C find . -depth -name "*[![:print:][:space:]]*" -print0)
Try this:
find / | grep -P "[\x80-\xFF]"
This will locate all non-ASCII characters in file and folder names, and help you to find the guilty culprits :P