How to use unicode characters in Windows command line?
Solution 1:
Try:
chcp 65001
which will change the code page to UTF-8. Also, you need to use Lucida console fonts.
Solution 2:
My background: I use Unicode input/output in a console for years (and do it a lot daily. Moreover, I develop support tools for exactly this task). There are very few problems, as far as you understand the following facts/limitations:
-
CMD
and “console” are unrelated factors.CMD.exe
is a just one of programs which are ready to “work inside” a console (“console applications”). - AFAIK,
CMD
has perfect support for Unicode; you can enter/output all Unicode chars when any codepage is active. - Windows’ console has A LOT of support for Unicode — but it is not perfect (just “good enough”; see below).
-
chcp 65001
is very dangerous. Unless a program was specially designed to work around defects in the Windows’ API (or uses a C runtime library which has these workarounds), it would not work reliably. Win8 fixes ½ of these problems withcp65001
, but the rest is still applicable to Win10. - I work in
cp1252
. As I already said: To input/output Unicode in a console, one does not need to set the codepage.
The details
- To read/write Unicode to a console, an application (or its C runtime library) should be smart enough to use not
File-I/O
API, butConsole-I/O
API. (For an example, see how Python does it.) - Likewise, to read Unicode command-line arguments, an application (or its C runtime library) should be smart enough to use the corresponding API.
- Console font rendering supports only Unicode characters in BMP (in other words: below
U+10000
). Only simple text rendering is supported (so European — and some East Asian — languages should work fine — as far as one uses precomposed forms). [There is a minor fine print here for East Asian and for characters U+0000, U+0001, U+30FB.]
Practical considerations
-
The defaults on Window are not very helpful. For best experience, one should tune up 3 pieces of configuration:
- For output: a comprehensive console font. For best results, I recommend my builds. (The installation instructions are present there — and also listed in other answers on this page.)
- For input: a capable keyboard layout. For best results, I recommend my layouts.
- For input: allow HEX input of Unicode.
-
One more gotcha with “Pasting” into a console application (very technical):
- HEX input delivers a character on
KeyUp
ofAlt
; all the other ways to deliver a character happen onKeyDown
; so many applications are not ready to see a character onKeyUp
. (Only applicable to applications usingConsole-I/O
API.) - Conclusion: many application would not react on HEX input events.
- Moreover, what happens with a “Pasted” character depends on the current keyboard layout: if the character can be typed without using prefix keys (but with arbitrary complicated combination of modifiers, as in
Ctrl-Alt-AltGr-Kana-Shift-Gray*
) then it is delivered on an emulated keypress. This is what any application expects — so pasting anything which contains only such characters is fine. - However, the “other” characters are delivered by emulating HEX input.
Conclusion: unless your keyboard layout supports input of A LOT of characters without prefix keys, some buggy applications may skip characters when you
Paste
via Console’s UI:Alt-Space E P
. (This is why I recommend using my keyboard layouts!) - HEX input delivers a character on
One should also keep in mind that the “alternative, ‘more capable’ consoles” for Windows are not consoles at all. They do not support Console-I/O
APIs, so the programs which rely on these APIs to work would not function. (The programs which use only “File-I/O APIs to the console filehandles” would work fine, though.)
One example of such non-console is a part of MicroSoft’s Powershell
. I do not use it; to experiment, press and release WinKey
, then type powershell
.
(On the other hand, there are programs such as ConEmu
or ANSICON
which try to do more: they “attempt” to intercept Console-I/O
APIs to make “true console applications” work too. This definitely works for toy example programs; in real life, this may or may not solve your particular problems. Experiment.)
Summary
set font, keyboard layout (and optionally, allow HEX input).
use only programs which go through
Console-I/O
APIs, and accept Unicode command-line arguments. For example, anycygwin
-compiled program should be fine. As I already said,CMD
is fine too.
UPD: Initially, for a bug in cp65001
, I was mixing up Kernel and CRTL layers (UPD²: and Windows user-mode API!). Also: Win8 fixes one half of this bug; I clarified the section about “better console” application, and added a reference to how Python does it.
Solution 3:
I had same problem (I'm from the Czech Republic). I have an English installation of Windows, and I have to work with files on a shared drive. Paths to the files include Czech-specific characters.
The solution that works for me is:
In the batch file, change the charset page
My batch file:
chcp 1250
copy "O:\VEŘEJNÉ\ŽŽŽŽŽŽ\Ž.xls" c:\temp
The batch file has to be saved in CP 1250.
Note that the console will not show characters correctly, but it will understand them...
Solution 4:
Check the language for non-Unicode programs. If you have problems with Russian in the Windows console, then you should set Russian here:
Solution 5:
It's is quite difficult to change the default Codepage of Windows console. When you search the web you find different proposals, however some of them may break your Windows entirely, i.e. your PC does not boot anymore.
The most secure solution is this one:
Go to your Registry key HKEY_CURRENT_USER\Software\Microsoft\Command Processor
and add String value Autorun
= chcp 65001
.
Or you can use this small Batch-Script for the most common code pages.
@ECHO off
SET ROOT_KEY="HKEY_CURRENT_USER"
FOR /f "skip=2 tokens=3" %%i in ('reg query HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage /v OEMCP') do set OEMCP=%%i
ECHO System default values:
ECHO.
ECHO ...............................................
ECHO Select Codepage
ECHO ...............................................
ECHO.
ECHO 1 - CP1252
ECHO 2 - UTF-8
ECHO 3 - CP850
ECHO 4 - ISO-8859-1
ECHO 5 - ISO-8859-15
ECHO 6 - US-ASCII
ECHO.
ECHO 9 - Reset to System Default (CP%OEMCP%)
ECHO 0 - EXIT
ECHO.
SET /P CP="Select a Codepage: "
if %CP%==1 (
echo Set default Codepage to CP1252
reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 1252>nul" /f
) else if %CP%==2 (
echo Set default Codepage to UTF-8
reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 65001>nul" /f
) else if %CP%==3 (
echo Set default Codepage to CP850
reg add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 850>nul" /f
) else if %CP%==4 (
echo Set default Codepage to ISO-8859-1
add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 28591>nul" /f
) else if %CP%==5 (
echo Set default Codepage to ISO-8859-15
add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 28605>nul" /f
) else if %CP%==6 (
echo Set default Codepage to ASCII
add "%ROOT_KEY%\Software\Microsoft\Command Processor" /v Autorun /t REG_SZ /d "@chcp 20127>nul" /f
) else if %CP%==9 (
echo Reset Codepage to System Default
reg delete "%ROOT_KEY%\Software\Microsoft\Command Processor" /v AutoRun /f
) else if %CP%==0 (
echo Bye
) else (
echo Invalid choice
pause
)
Using @chcp 65001>nul
instead of chcp 65001
suppresses the output "Active code page: 65001" you would get every time you start a new command line windows.
A full list of all available number you can get from Code Page Identifiers
Note, the settings will apply only for the current user. If you like to set it for all users, replace line SET ROOT_KEY="HKEY_CURRENT_USER"
by SET ROOT_KEY="HKEY_LOCAL_MACHINE"