convert file type to utf-8 on unix - iconv is failing [duplicate]

Solution 1:

I have similar troubles with MD5 hashes created on WindowsXP (under Cygwin), saved to a file, then copied to a Linux system where the hashes are computed for copy verification. If the name of a file being hashed contains non-ASCII characters, md5sum reports the file missing, because it's not decoding the filename correctly. However, if I open the textfile containing the hashes in Notepad and change the encoding from ANSI to UTF-8, the Linux md5sum will get the encoding correct.

ANSI isn't really a proper encoding (to anyone but Microsoft), so that's why iconv isn't picking up on it. You might get away windows-1252 instead, but there's no guarantee it will always work:

iconv -f windows-1252 -t utf-8 filename.from > filename.to

For the record, file gives me this on one of those MD5 textfiles:

$ file tequila.ansi.txt
tequila.ansi.txt: ISO-8859 text

Solution 2:

There are several encodings which are called "ANSI" in Windows. In fact, ANSI is a misnomer. iconv has no way of guessing which you want.

The ANSI encoding is the encoding used by the "A" functions in the Windows API (the "W" functions use UTF-16). Which encoding it corresponds to usually depends on your Windows system language. The most common is CP 1252 (also known as Windows-1252). So, when your editor says ANSI, it is meaning "whatever the API functions use as the default ANSI encoding", which is the default non-Unicode encoding used in your system (and thus usually the one which is used for text files).

So, to convert the file correctly, you first should find out which is the "ANSI" encoding for your Windows system (or simply ask your text editor there to save using a specific encoding).

Solution 3:

Are you sure "ANSI" is the correct character encoding/input name for iconv? You could try to run "file filename.php", often file will tell (what it thinks) the encoding is. You could also try to not specify the from encoding when doing the conversion, or you could just try all of them:

for i in `iconv -l`; do iconv -f $i -t utf-8 filename.php > filename.php.$i; done

Solution 4:

You could just convert it to UTF-8 with Notepad++.