Best way to convert text files between character sets?
Stand-alone utility approach
iconv -f ISO-8859-1 -t UTF-8 in.txt > out.txt
-f ENCODING the encoding of the input
-t ENCODING the encoding of the output
You don't have to specify either of these arguments. They will default to your current locale, which is usually UTF-8.
Try VIM
If you have vim
you can use this:
Not tested for every encoding.
The cool part about this is that you don't have to know the source encoding
vim +"set nobomb | set fenc=utf8 | x" filename.txt
Be aware that this command modify directly the file
Explanation part!
-
+
: Used by vim to directly enter command when opening a file. Usualy used to open a file at a specific line:vim +14 file.txt
-
|
: Separator of multiple commands (like;
in bash) -
set nobomb
: no utf-8 BOM -
set fenc=utf8
: Set new encoding to utf-8 doc link -
x
: Save and close file -
filename.txt
: path to the file -
"
: qotes are here because of pipes. (otherwise bash will use them as bash pipe)
Under Linux you can use the very powerful recode command to try and convert between the different charsets as well as any line ending issues. recode -l will show you all of the formats and encodings that the tool can convert between. It is likely to be a VERY long list.
iconv(1)
iconv -f FROM-ENCODING -t TO-ENCODING file.txt
Also there are iconv-based tools in many languages.