Converting ascii Russian to Russian?

I have a text document that is supposed to be written in Russian but it seems to instead be ascii:

Óñòàíîâêà:
1)Çàïóñêàåì QuidamStudioSetup3.15.exe
2)Ïðè çàïðîñå ñåðèéíîãî íîìåðà ââîäèì

How could I convert this to unicode Russian characters that are readable?

Solution 1:

It is not "ASCII" nor "ASCII Russian".

Before Unicode became widespread, most computer systems used the ISO-8859 character encodings, of which there were 16, each for a different region (Central European, Cyrillic, Greek...). Windows had its own 'code pages', very similar but with extra glyphs in otherwise-unused ranges. All these character encodings are 8-bit and only differ in the second half (128-255).

The problem with these encodings is that it's next to impossible for a program to determine which encoding was used to save a file, unless it was specified explicitly (such as in HTML pages; however, plain text files have no such metadata tags). Read the Wikipedia article on Mojibake for a more detailed description.

In your example, the document was saved using Windows-1251 (Cyrillic), but your program reads it as if it were Windows-1252 (Western European), which has very different characters in the same positions. To the computer, it looks perfectly okay – it doesn't understand languages or scripts. (There are programs which do statistical analysis in order to determine the correct encoding, though – some web browsers have such a function.)

There are several ways you could convert such text to Unicode:

Use online tools such as this one or this one.
Use your web browser:
1. Drag the .txt file into the browser.
2. From View → Character Encoding (or Firefox → Web Developer → Character Encoding, or Wrench → Tools → Encoding), pick the correct original encoding: "Cyrillic (Windows-1251)" in your case.
Use the Notepad2 text editor:
1. Open the file.
2. From File → Encoding → Recode..., choose the right original encoding.
Use GNU iconv, with Windows binaries either from GnuWin32 or Gettext for Win32.
```
iconv -f cp1251 -t utf-8 < myfile.txt > myfile.fixed.txt
```
Windows Notepad will correctly read UTF-8 and UTF-16 encoded text.

Solution 2:

You could convert the encoding using a program such as iconv - but you'll need to know what encoding was used.

It seems to be Windows-1251 according to a random web page found by Google.

Установка:
1) Запускаем QuidamStudioSetup3.15.exe
2) При запросе серийного номера вводим

I don't know Russian but pasting that into translate.google.com suggests that the above is plausible:

installation:
1) Run QuidamStudioSetup3.15.exe
2) When prompted, enter the serial number

So ...

iconv -f 1252 -t UTF-8 document.txt

Should convert your test file into something that can be opened and read in Notepad

Excel Conditional Formatting Multiple Data Bars and Data Icons in one cell

How can I change the colors of GNOME Terminal each time it starts?

What is your preferred size and number of monitors? [closed]

Is free security software as good as paid security software? [closed]

Which functions are the composition of convex functions?

Are Sobolev spaces $W^{k,1}(\mathbb R^d)$ and $H^{k,1}(\mathbb R^d)$ the same?

Integral solutions $(a,b,c)$ for $a^\pi + b^\pi = c^\pi$

About the first positive root of $\sum_{k=1}^n\tan(kx)=0$

find the maximum $\frac{\frac{x^2_{1}}{x_{2}}+\frac{x^2_{2}}{x_{3}}+\cdots+\frac{x^2_{n-1}}{x_{n}}+\frac{x^2_{n}}{x_{1}}}{x_{1}+x_{2}+\cdots+x_{n}}$

Is there a non-standard set theory that makes use of a null element?

Struggling to understand epsilon-delta

Why this algorithm for egyptian fractions doesn't terminate in ~$2$% cases?

Converting ascii Russian to Russian?

Solution 1:

Solution 2:

Related

Recent Posts