Difference between int and char in getchar/fgetc and putchar/fputc?
TL;DR:
-
char c; c = getchar();
is wrong, broken and buggy. -
int c; c = getchar();
is correct.
This applies to getc
and fgetc
as well, if not even more so, because one would often read until the end of the file.
Always store the return value of getchar
(fgetc
, getc
...) (and putchar
) initially into a variable of type int
.
The argument to putchar
can be any of int
, char
, signed char
or unsigned char
; its type doesn't matter, and all of them work the same, even though one might result in positive and other in negative integers being passed for characters above and including \200
(128).
The reason why you must use int
to store the return value of both getchar
and putchar
is that when the end-of-file condition is reached (or an I/O error occurs), both of them return the value of the macro EOF
which is a negative integer constant, (usually -1
).
For getchar
, if the return value is not EOF
, it is the read unsigned char
zero-extended to an int
. That is, assuming 8-bit characters, the values returned can be 0
...255
or the value of the macro EOF
; again assuming 8-bit char, there is no way to squeeze these 257 distinct values into 256 so that each of them could be identified uniquely.
Now, if you stored it into char
instead, the effect would depend on whether the character type is signed or unsigned by default! This varies from compiler to compiler, architecture to architecture. If char
is signed and assuming EOF
is defined as -1
, then both EOF
and character '\377'
on input would compare equal to EOF
; they'd be sign-extended to (int)-1
.
On the other hand, if char
is unsigned (as it is by default on ARM processors, including Raspberry PI systems; and seems to be true for AIX too), there is no value that could be stored in c
that would compare equal to -1
; including EOF
; instead of breaking out on EOF
, your code would output a single \377
character.
The danger here is that with signed char
s the code seems to be working correctly even though it is still horribly broken - one of the legal input values is interpreted as EOF
. Furthermore, C89, C99, C11 does not mandate a value for EOF
; it only says that EOF
is a negative integer constant; thus instead of -1
it could as well be say -224
on a particular implementation, which would cause spaces behave like EOF
.
gcc
has the switch -funsigned-char
which can be used to make the char
unsigned on those platforms where it defaults to signed:
% cat test.c
#include <stdio.h>
int main(void)
{
char c;
printf("Enter characters : ");
while ((c = getchar()) != EOF){
putchar(c);
}
return 0;
}
Now we run it with signed char
:
% gcc test.c && ./a.out
Enter characters : sfdasadfdsaf
sfdasadfdsaf
^D
%
Seems to be working right. But with unsigned char
:
% gcc test.c -funsigned-char && ./a.out
Enter characters : Hello world
Hello world
���������������������������^C
%
That is, I tried to press Ctrl-D
there many times but a �
was printed for each EOF
instead of breaking the loop.
Now, again, for the signed char
case, it cannot distinguish between char
255 and EOF
on Linux, breaking it for binary data and such:
% gcc test.c && echo -e 'Hello world\0377And some more' | ./a.out
Enter characters : Hello world
%
Only the first part up to the \0377
escape was written to stdout.
Beware that comparisons between character constants and an int
containing the unsigned character value might not work as expected (e.g. the character constant 'ä'
in ISO 8859-1 would mean the signed value -28
. So assuming that you write code that would read input until 'ä'
in ISO 8859-1 codepage, you'd do
int c;
while ((c = getchar()) != EOF){
if (c == (unsigned char)'ä') {
/* ... */
}
}
Due to integer promotion, all char
values fit into an int
, and are automatically promoted on function calls, thus you can give any of int
, char
, signed char
or unsigned char
to putchar
as an argument (not to store its return value), and it would work as expected.
The actual value passed in the integer might be positive or even negative; for example the character constant \377
would be negative on a 8-bit-char system where char
is signed; however putchar
(or fputc
actually) will convert the value to an unsigned char. C11 7.21.7.3p2:
2 The fputc function writes the character specified by
c
(converted to an unsigned char) to the output stream pointed to by stream [...]
(emphasis mine)
I.e. the fputc
will be guaranteed to convert the given c
as if by (unsigned char)c
Always use int
to save character from getchar()
as EOF
constant is of int
type. If you use char
then the comparison against EOF
is not correct.
You can safely pass char
to putchar()
though as it will be promoted to int
automatically.
Note:
Technically using char
will work in most cases, but then you can't have 0xFF character as they will be interpreted as EOF
due to type conversion. To cover all cases always use int
. As @Ilja put it -- int
is needed to represent all 256 possible character values and the EOF
, which is 257 possible values in total, which cannot be stored in char
type.