Converting ascii Russian to Russian?
I have a text document that is supposed to be written in Russian but it seems to instead be ascii:
Óñòàíîâêà:
1)Çàïóñêàåì QuidamStudioSetup3.15.exe
2)Ïðè çàïðîñå ñåðèéíîãî íîìåðà ââîäèì
How could I convert this to unicode Russian characters that are readable?
Solution 1:
It is not "ASCII" nor "ASCII Russian".
Before Unicode became widespread, most computer systems used the ISO-8859 character encodings, of which there were 16, each for a different region (Central European, Cyrillic, Greek...). Windows had its own 'code pages', very similar but with extra glyphs in otherwise-unused ranges. All these character encodings are 8-bit and only differ in the second half (128-255).
The problem with these encodings is that it's next to impossible for a program to determine which encoding was used to save a file, unless it was specified explicitly (such as in HTML pages; however, plain text files have no such metadata tags). Read the Wikipedia article on Mojibake for a more detailed description.
In your example, the document was saved using Windows-1251 (Cyrillic), but your program reads it as if it were Windows-1252 (Western European), which has very different characters in the same positions. To the computer, it looks perfectly okay – it doesn't understand languages or scripts. (There are programs which do statistical analysis in order to determine the correct encoding, though – some web browsers have such a function.)
There are several ways you could convert such text to Unicode:
Use online tools such as this one or this one.
-
Use your web browser:
Drag the
.txt
file into the browser.From View → Character Encoding (or Firefox → Web Developer → Character Encoding, or Wrench → Tools → Encoding), pick the correct original encoding: "Cyrillic (Windows-1251)" in your case.
-
Use the Notepad2 text editor:
Open the file.
From File → Encoding → Recode..., choose the right original encoding.
-
Use GNU
iconv
, with Windows binaries either from GnuWin32 or Gettext for Win32.iconv -f cp1251 -t utf-8 < myfile.txt > myfile.fixed.txt
Windows Notepad will correctly read UTF-8 and UTF-16 encoded text.
Solution 2:
You could convert the encoding using a program such as iconv - but you'll need to know what encoding was used.
It seems to be Windows-1251 according to a random web page found by Google.
Установка:
1) Запускаем QuidamStudioSetup3.15.exe
2) При запросе серийного номера вводим
I don't know Russian but pasting that into translate.google.com suggests that the above is plausible:
installation:
1) Run QuidamStudioSetup3.15.exe
2) When prompted, enter the serial number
So ...
iconv -f 1252 -t UTF-8 document.txt
Should convert your test file into something that can be opened and read in Notepad