What is the use of wchar_t in general programming?
Today I was learning some C++ basics and came to know about wchar_t
. I was not able to figure out, why do we actually need this datatype, and how do I use it?
wchar_t
is intended for representing text in fixed-width, multi-byte encodings; since wchar_t
is usually 2 bytes in size it can be used to represent text in any 2-byte encoding. It can also be used for representing text in variable-width multi-byte encodings of which the most common is UTF-16.
On platforms where wchar_t
is 4 bytes in size it can be used to represent any text using UCS-4 (Unicode), but since on most platforms it's only 2 bytes it can only represent Unicode in a variable-width encoding (usually UTF-16). It's more common to use char
with a variable-width encoding e.g. UTF-8 or GB 18030.
About the only modern operating system to use wchar_t
extensively is Windows; this is because Windows adopted Unicode before it was extended past U+FFFF and so a fixed-width 2-byte encoding (UCS-2) appeared sensible. Now UCS-2 is insufficient to represent the whole of Unicode and so Windows uses UTF-16, still with wchar_t
2-byte code units.
wchar_t
is a wide character. It is used to represent characters which require more memory to represent them than a regular char
. It is, for example, widely used in the Windows API.
However, the size of a wchar_t
is implementation-dependant and not guaranteed to be larger than char
. If you need to support a specific form of character format greater than 8 bits, you may want to turn to char32_t
and char16_t
which are guaranteed to be 32 and 16 bits respectively.
wchar_t
is used when you need to store characters with codes greater than 255 (it has a greater value than char
can store).
char
can take 256 different values which corresponds to entries in the ISO Latin tables. On the other hand, wide char can take more than 65536 values which corresponds to Unicode values. It is a recent international standard which allows the encoding of characters for virtually all languages and commonly used symbols.