How Can I Best Guess the Encoding when the BOM (Byte Order Mark) is Missing?

Maybe you can shell out to a Python script that uses Chardet: Universal Encoding Detector. It is a reimplementation of the character encoding detection that used by Firefox, and is used by many different applications. Useful links: Mozilla's code, research paper it was based on (ironically, my Firefox fails to correctly detect the encoding of that page), short explanation, detailed explanation.


Here is how notepad does that

There is also the python Universal Encoding Detector which you can check.