How can I find encoding of a file via a script on Linux?

I need to find the encoding of all files that are placed in a directory. Is there a way to find the encoding used?

The file command is not able to do this.

The encoding that is of interest to me is ISO 8859-1. If the encoding is anything else, I want to move the file to another directory.


Solution 1:

It sounds like you're looking for enca. It can guess and even convert between encodings. Just look at the man page.

Or, failing that, use file -i (Linux) or file -I (OS X). That will output MIME-type information for the file, which will also include the character-set encoding. I found a man-page for it, too :)

Solution 2:

file -bi <file name>

If you like to do this for a bunch of files

for f in `find | egrep -v Eliminate`; do echo "$f" ' -- ' `file -bi "$f"` ; done

Solution 3:

uchardet - An encoding detector library ported from Mozilla.

Usage:

~> uchardet file.java
UTF-8

Various Linux distributions (Debian, Ubuntu, openSUSE, Pacman, etc.) provide binaries.