Converting ascii Russian to Russian?

I have a text document that is supposed to be written in Russian but it seems to instead be ascii:

Óñòàíîâêà:
1)Çàïóñêàåì QuidamStudioSetup3.15.exe
2)Ïðè çàïðîñå ñåðèéíîãî íîìåðà ââîäèì

How could I convert this to unicode Russian characters that are readable?


Solution 1:

It is not "ASCII" nor "ASCII Russian".

Before Unicode became widespread, most computer systems used the ISO-8859 character encodings, of which there were 16, each for a different region (Central European, Cyrillic, Greek...). Windows had its own 'code pages', very similar but with extra glyphs in otherwise-unused ranges. All these character encodings are 8-bit and only differ in the second half (128-255).

The problem with these encodings is that it's next to impossible for a program to determine which encoding was used to save a file, unless it was specified explicitly (such as in HTML pages; however, plain text files have no such metadata tags). Read the Wikipedia article on Mojibake for a more detailed description.

In your example, the document was saved using Windows-1251 (Cyrillic), but your program reads it as if it were Windows-1252 (Western European), which has very different characters in the same positions. To the computer, it looks perfectly okay – it doesn't understand languages or scripts. (There are programs which do statistical analysis in order to determine the correct encoding, though – some web browsers have such a function.)

There are several ways you could convert such text to Unicode:

  • Use online tools such as this one or this one.

  • Use your web browser:

    1. Drag the .txt file into the browser.

    2. From View → Character Encoding (or Firefox → Web Developer → Character Encoding, or Wrench → Tools → Encoding), pick the correct original encoding: "Cyrillic (Windows-1251)" in your case.

  • Use the Notepad2 text editor:

    1. Open the file.

    2. From File → Encoding → Recode..., choose the right original encoding.

  • Use GNU iconv, with Windows binaries either from GnuWin32 or Gettext for Win32.

    iconv -f cp1251 -t utf-8 < myfile.txt > myfile.fixed.txt

    Windows Notepad will correctly read UTF-8 and UTF-16 encoded text.

Solution 2:

You could convert the encoding using a program such as iconv - but you'll need to know what encoding was used.

It seems to be Windows-1251 according to a random web page found by Google.

Установка:
1) Запускаем QuidamStudioSetup3.15.exe
2) При запросе серийного номера вводим

I don't know Russian but pasting that into translate.google.com suggests that the above is plausible:

installation:
1) Run QuidamStudioSetup3.15.exe
2) When prompted, enter the serial number

So ...

iconv -f 1252 -t UTF-8 document.txt

Should convert your test file into something that can be opened and read in Notepad