How to remove invalid characters from filenames?

I have files with invalid characters like these

009_-_�%86ndringshåndtering.html

It is a Æ where something have gone wrong in the filename.

Is there a way to just remove all invalid characters?

or could tr be used somehow?

echo "009_-_�%86ndringshåndtering.html" | tr ???

One way would be with sed:

mv 'file' $(echo 'file' | sed -e 's/[^A-Za-z0-9._-]/_/g')

Replace file with your filename, of course. This will replace anything that isn't a letter, number, period, underscore, or dash with an underscore. You can add or remove characters to keep as you like, and/or change the replacement character to anything else, or nothing at all.


I had some japanese files with broken filenames recovered from a broken usb stick and the solutions above didn't work for me.

I recommend the detox package:

The detox utility renames files to make them easier to work with. It removes spaces and other such annoyances. It'll also translate or cleanup Latin-1 (ISO 8859-1) characters encoded in 8-bit ASCII, Unicode characters encoded in UTF-8, and CGI escaped characters.

Example usage:

detox -r -v /path/to/your/files
-r Recurse into subdirectories
-v Be verbose about which files are being renamed 
-n Can be used for a dry run (only show what would be changed)

I assume you are on Linux box and the files were made on a Windows box. Linux uses UTF-8 as the character encoding for filenames, while Windows uses something else. I think this is the cause of the problem.

I would use "convmv". This is a tool that can convert filenames from one character encoding to another. For Western Europe one of these normally works:

convmv -r -f windows-1252 -t UTF-8 .
convmv -r -f ISO-8859-1 -t UTF-8 .
convmv -r -f cp-850 -t UTF-8 .

If you need to install it on a Debian based Linux you can do so by running:

sudo apt-get install convmv

It works for me every time and it does recover the original filename.

Source: LeaseWebLabs


I assume you mean you want to traverse the filesystem and fix all such files?

Here's the way I'd do it

find /path/to/files -type f -print0 | \
perl -n0e '$new = $_; if($new =~ s/[^[:ascii:]]/_/g) {
  print("Renaming $_ to $new\n"); rename($_, $new);
}'

That would find all files with non-ascii characters and replace those characters with underscores (_). Use caution though, if a file with the new name already exists, it'll overwrite it. The script can be modified to check for such a case, but I didnt put that in to keep it simple.