How can I find encoding of a file via a script on Linux?
I need to find the encoding of all files that are placed in a directory. Is there a way to find the encoding used?
The file
command is not able to do this.
The encoding that is of interest to me is ISO 8859-1. If the encoding is anything else, I want to move the file to another directory.
Solution 1:
It sounds like you're looking for enca
. It can guess and even convert between encodings. Just look at the man page.
Or, failing that, use file -i
(Linux) or file -I
(OS X). That will output MIME-type information for the file, which will also include the character-set encoding. I found a man-page for it, too :)
Solution 2:
file -bi <file name>
If you like to do this for a bunch of files
for f in `find | egrep -v Eliminate`; do echo "$f" ' -- ' `file -bi "$f"` ; done
Solution 3:
uchardet - An encoding detector library ported from Mozilla.
Usage:
~> uchardet file.java
UTF-8
Various Linux distributions (Debian, Ubuntu, openSUSE, Pacman, etc.) provide binaries.