What causes a char to be signed or unsigned when using gcc?
What causes if a char
in C (using gcc) is signed or unsigned? I know that the standard doesn't dictate one over the other and that I can check CHAR_MIN
and CHAR_MAX
from limits.h but I want to know what triggers one over the other when using gcc
If I read limits.h from libgcc-6 I see that there is a macro __CHAR_UNSIGNED__
which defines a "default" char signed or unsigned but I'm unsure if this is set by the compiler at (his) built time.
I tried to list GCCs predefined makros with
$ gcc -dM -E -x c /dev/null | grep -i CHAR
#define __UINT_LEAST8_TYPE__ unsigned char
#define __CHAR_BIT__ 8
#define __WCHAR_MAX__ 0x7fffffff
#define __GCC_ATOMIC_CHAR_LOCK_FREE 2
#define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2
#define __SCHAR_MAX__ 0x7f
#define __WCHAR_MIN__ (-__WCHAR_MAX__ - 1)
#define __UINT8_TYPE__ unsigned char
#define __INT8_TYPE__ signed char
#define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2
#define __CHAR16_TYPE__ short unsigned int
#define __INT_LEAST8_TYPE__ signed char
#define __WCHAR_TYPE__ int
#define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2
#define __SIZEOF_WCHAR_T__ 4
#define __INT_FAST8_TYPE__ signed char
#define __CHAR32_TYPE__ unsigned int
#define __UINT_FAST8_TYPE__ unsigned char
but wasn't able to find __CHAR_UNSIGNED__
Background: I've some code which I compile on two different machines:
Desktop PC:
- Debian GNU/Linux 9.1 (stretch)
- gcc version 6.3.0 20170516 (Debian 6.3.0-18)
- Intel(R) Core(TM) i3-4150
- libgcc-6-dev: 6.3.0-18
-
char
is signed
Raspberry Pi3:
- Raspbian GNU/Linux 9.1 (stretch)
- gcc version 6.3.0 20170516 (Raspbian 6.3.0-18+rpi1)
- ARMv7 Processor rev 4 (v7l)
- libgcc-6-dev: 6.3.0-18+rpi
-
char
is unsigned
So the only obvious difference is the CPU architecture...
Solution 1:
According to the C11 standard (read n1570), char
can be signed
or unsigned
(so you actually have two flavors of C). What exactly it is is implementation specific.
Some processors and instruction set architectures or application binary interfaces favor a signed
character (byte) type (e.g. because it maps nicely to some machine code instruction), other favor an unsigned
one.
gcc
has even some -fsigned-char
or -funsigned-char
option which you should almost never use (because changing it breaks some corner cases in calling conventions and ABIs) unless you recompile everything, including your C standard library.
You could use feature_test_macros(7) and <endian.h>
(see endian(3)) or autoconf on Linux to detect what your system has.
In most cases, you should write portable C code, which does not depend upon those things. And you can find cross-platform libraries (e.g. glib) to help you in that.
BTW gcc -dM -E -x c /dev/null
also gives __BYTE_ORDER__
etc, and if you want an unsigned 8 bit byte you should use <stdint.h>
and its uint8_t
(more portable and more readable). And standard limits.h defines CHAR_MIN
and SCHAR_MIN
and CHAR_MAX
and SCHAR_MAX
(you could compare them for equality to detect signed char
s implementations), etc...
BTW, you should care about character encoding, but most systems today use UTF-8 everywhere. Libraries like libunistring are helpful. See also this and remember that practically speaking an Unicode character encoded in UTF-8 can span several bytes (i.e. char
-s).
Solution 2:
The default depends on the platform and native codeset. For example, machines that use EBCDIC (mainframes usually) must use unsigned char
(or have CHAR_BIT > 8
) because the C standard requires characters in the basic codeset to be positive, and EBCDIC uses codes like 240 for digit 0. (C11 standard, §6.2.5 Types ¶2 says: An object declared as type char
is large enough to store any member of the basic execution character set. If a member of the basic execution character set is stored in a char
object, its value is guaranteed to be nonnegative.)
You can control which sign GCC uses with -fsigned-char
or -funsigned-char
options. Whether that’s a good idea is a separate discussion.
Solution 3:
Character type char
to be signed
or unsigned
, depending on the platform and compiler.
According to this reference link :
The C and C++ standards allows the character type char to be signed or unsigned, depending on the platform and compiler.
Most systems, including x86 GNU/Linux and Microsoft Windows, use signed char,
but those based on PowerPC and ARM processors typically use unsigned char.(29)
This can lead to unexpected results when porting programs between platforms which have different defaults for the type of char.
GCC provides the options -fsigned-char
and -funsigned-char
to set the default type of char
.