I'm trying to understand getchar() != EOF
I'm reading The C Programming Language and have understood everything so far.
However when I came across the getchar()
and putchar()
, I failed to understand what is their use, and more specifically, what the following code does.
main()
{
int c;
while ((c = getchar()) != EOF)
putchar(c);
}
I understand the main()
function, the declaration of the integer c
and the while
loop. Yet I'm confused about the condition inside of the while
loop. What is the input in this C code, and what is the output.
Sorry if this is a basic and stupid question, but I'm just looking for a simple explanation before I move on in the book and become more confused.
Solution 1:
This code can be written more clearly as:
main()
{
int c;
while (1) {
c = getchar(); // Get one character from the input
if (c == EOF) { break; } // Exit the loop if we receive EOF ("end of file")
putchar(c); // Put the character to the output
}
}
The EOF
character is received when there is no more input. The name makes more sense in the case where the input is being read from a real file, rather than user input (which is a special case of a file).
[As an aside, generally the
main
function should be written as int main(void)
.]
Solution 2:
getchar()
is a function that reads a character from standard input. EOF
is a special character used in C to state that the END OF FILE has been reached.
Usually you will get an EOF
character returning from getchar()
when your standard input is other than console (i.e., a file).
If you run your program in unix like this:
$ cat somefile | ./your_program
Then your getchar()
will return every single character in somefile
and EOF
as soon as somefile
ends.
If you run your program like this:
$ ./your_program
And send a EOF
through the console (by hitting CTRL+D
in Unix or CTRL+Z in Windows), then getchar()
will also returns EOF
and the execution will end.
Solution 3:
The code written with current C standards should be
#include <stdio.h>
int main(void)
{
int c;
while ((c = getchar()) != EOF)
putchar(c);
}
The loop could be rewritten as
int c;
while (1) {
c = getchar();
if (c != EOF)
putchar(c);
else
break;
}
this reads as
- repeat forever
- get the next character ("byte") of input from standard input and store it into
c
- if no exceptional condition occurred while reading the said character
- then output the character stored into
c
into standard output
- then output the character stored into
- else
- break the loop
- get the next character ("byte") of input from standard input and store it into
Many programming languages handle exceptional conditions through raising exceptions that break the normal program flow. C does no such thing. Instead, functions that can fail have a return value and any exceptional conditions are signalled by a special return value, which you need to check from the documentation of the given function. In case of getchar
, the documentation from the C11 standard says (C11 7.21.7.6p3):
- The
getchar
function returns the next character from the input stream pointed to bystdin
. If the stream is at end-of-file, the end-of-file indicator for the stream is set andgetchar
returnsEOF
. If a read error occurs, the error indicator for the stream is set andgetchar
returnsEOF
.
It is stated elsewhere that EOF
is an integer constant that is < 0, and any ordinary return value is >= 0 - the unsigned char
zero-extended to an int
.
The stream being at end-of-file means that all of the input has been consumed. For standard input it is possible to cause this from keyboard by typing Ctrl+D on Unix/Linux terminals and Ctrl+Z in Windows console windows. Another possibility would be for the program to receive the input from a file or a pipe instead of from keyboard - then end-of-file would be signalled whenever that input were fully consumed, i.e.
cat file | ./myprogram
or
./myprogram < file
As the above fragment says, there are actually two different conditions that can cause getchar
to return EOF
: either the end-of-file was reached, or an actual error occurred. This cannot be deduced from the return value alone. Instead you must use the functions feof
and ferror
. feof(stdin)
would return a true value if end-of-file was reached on the standard input. ferror(stdin)
would return true if an error occurred.
If an actual error occurred, the variable errno
defined by <errno.h>
would contain the error code; the function perror
can be used to automatically display a human readable error message with a prefix. Thus we could expand the example to
#include <stdio.h>
#include <errno.h> // for the definition of errno
#include <stdlib.h> // for exit()
int main(void)
{
int c;
while ((c = getchar()) != EOF)
putchar(c);
if (feof(stdin)) {
printf("end-of-file reached\n");
exit(0);
}
else if (ferror(stdin)) {
printf("An error occurred. errno set to %d\n", errno);
perror("Human readable explanation");
exit(1);
}
else {
printf("This should never happen...\n");
exit('?');
}
}
To trigger the end-of-file, one would use Ctrl+D (here displayed as ^D
) on a new line on Linux:
% ./a.out
Hello world
Hello world
^D
end-of-file reached
(notice how the input here is line-buffered, so the input is not interleaved within the line with output).
Likewise, we can get the same effect by using a pipeline.
% echo Hello world | ./a.out
Hello world
end-of-file reached
To trigger an error is a bit more tricky. In bash
and zsh
shells the standard input can be closed so that it doesn't come from anywhere, by appending <&-
to the command line:
% ./a.out <&-
An error occurred. errno set to 9
Human readable explanation: Bad file descriptor
Bad file descriptor, or EBADF
means that the standard input - file descriptor number 0 was invalid, as it was not opened at all.
Another fun way to generate an error would be to read the standard input from a directory - this causes errno to be set to EISDIR
on Linux:
% ./a.out < /
An error occurred. errno set to 21
Human readable explanation: Is a directory
Actually the return value of putchar
should be checked too - it likewise
returns EOF
on error, or the character written:
while ((c = getchar()) != EOF) {
if (putchar(c) == EOF) {
perror("putchar failed");
exit(1);
}
}
And now we can test this by redirecting the standard output to /dev/full
- however there is a gotcha - since standard output is buffered we need to write enough to cause the buffer to flush right away and not at the end of the program. We get infinite zero bytes from /dev/zero
:
% ./a.out < /dev/zero > /dev/full
putchar failed: No space left on device
P.S. it is very important to always use a variable of type int
to store the return value of getchar()
. Even though it reads a character, using signed
/unsigned
/plain char
is always wrong.
Solution 4:
Maybe you got confused by the fact that entering -1 on the command line does not end your program? Because getchar()
reads this as two chars, - and 1. In the assignment to c, the character is converted to the ASCII numeric value. This numeric value is stored in some memory location, accessed by c.
Then putchar(c)
retrieves this value, looks up the ASCII table and converts back to character, which is printed.
I guess finding the value -1 decimal in the ASCII table is impossible, because the table starts at 0. So getchar()
has to account for the different solutions at different platforms. maybe there is a getchar()
version for each platform?
I just find it strange that this EOF is not in the regular ascii. It could have been one of the first characters, which are not printable. For instance, End-of-line is in the ASCII.
What happens if you transfer your file from windows to linux? Will the EOF file character be automatically updated?