printf anomaly after "fork()"

OS: Linux, Language: pure C

I'm moving forward in learning C programming in general, and C programming under UNIX in a special case.

I detected a strange (for me) behaviour of the printf() function after using a fork() call.

Code

#include <stdio.h>
#include <system.h>

int main()
{
    int pid;
    printf( "Hello, my pid is %d", getpid() );

    pid = fork();
    if( pid == 0 )
    {
            printf( "\nI was forked! :D" );
            sleep( 3 );
    }
    else
    {
            waitpid( pid, NULL, 0 );
            printf( "\n%d was forked!", pid );
    }
    return 0;
}

Output

Hello, my pid is 1111
I was forked! :DHello, my pid is 1111
2222 was forked!

Why did the second "Hello" string occur in the child's output?

Yes, it is exactly what the parent printed when it started, with the parent's pid.

But! If we place a \n character at the end of each string we get the expected output:

#include <stdio.h>
#include <system.h>

int main()
{
    int pid;
    printf( "Hello, my pid is %d\n", getpid() ); // SIC!!

    pid = fork();
    if( pid == 0 )
    {
            printf( "I was forked! :D" ); // removed the '\n', no matter
            sleep( 3 );
    }
    else
    {
            waitpid( pid, NULL, 0 );
            printf( "\n%d was forked!", pid );
    }
    return 0;
}

Output:

Hello, my pid is 1111
I was forked! :D
2222 was forked!

Why does it happen? Is it correct behaviour, or is it a bug?


I note that <system.h> is a non-standard header; I replaced it with <unistd.h> and the code compiled cleanly.

When the output of your program is going to a terminal (screen), it is line buffered. When the output of your program goes to a pipe, it is fully buffered. You can control the buffering mode by the Standard C function setvbuf() and the _IOFBF (full buffering), _IOLBF (line buffering) and _IONBF (no buffering) modes.

You could demonstrate this in your revised program by piping the output of your program to, say, cat. Even with the newlines at the end of the printf() strings, you would see the double information. If you send it direct to the terminal, then you will see just the one lot of information.

The moral of the story is to be careful to call fflush(0); to empty all I/O buffers before forking.


Line-by-line analysis, as requested (braces etc removed - and leading spaces removed by markup editor):

  1. printf( "Hello, my pid is %d", getpid() );
  2. pid = fork();
  3. if( pid == 0 )
  4. printf( "\nI was forked! :D" );
  5. sleep( 3 );
  6. else
  7. waitpid( pid, NULL, 0 );
  8. printf( "\n%d was forked!", pid );

The analysis:

  1. Copies "Hello, my pid is 1234" into the buffer for standard output. Because there is no newline at the end and the output is running in line-buffered mode (or full-buffered mode), nothing appears on the terminal.
  2. Gives us two separate processes, with exactly the same material in the stdout buffer.
  3. The child has pid == 0 and executes lines 4 and 5; the parent has a non-zero value for pid (one of the few differences between the two processes - return values from getpid() and getppid() are two more).
  4. Adds a newline and "I was forked! :D" to the output buffer of the child. The first line of output appears on the terminal; the rest is held in the buffer since the output is line buffered.
  5. Everything halts for 3 seconds. After this, the child exits normally through the return at the end of main. At that point, the residual data in the stdout buffer is flushed. This leaves the output position at the end of a line since there is no newline.
  6. The parent comes here.
  7. The parent waits for the child to finish dying.
  8. The parent adds a newline and "1345 was forked!" to the output buffer. The newline flushes the 'Hello' message to the output, after the incomplete line generated by the child.

The parent now exits normally through the return at the end of main, and the residual data is flushed; since there still isn't a newline at the end, the cursor position is after the exclamation mark, and the shell prompt appears on the same line.

What I see is:

Osiris-2 JL: ./xx
Hello, my pid is 37290
I was forked! :DHello, my pid is 37290
37291 was forked!Osiris-2 JL: 
Osiris-2 JL: 

The PID numbers are different - but the overall appearance is clear. Adding newlines to the end of the printf() statements (which becomes standard practice very quickly) alters the output a lot:

#include <stdio.h>
#include <unistd.h>

int main()
{
    int pid;
    printf( "Hello, my pid is %d\n", getpid() );

    pid = fork();
    if( pid == 0 )
        printf( "I was forked! :D %d\n", getpid() );
    else
    {
        waitpid( pid, NULL, 0 );
        printf( "%d was forked!\n", pid );
    }
    return 0;
}

I now get:

Osiris-2 JL: ./xx
Hello, my pid is 37589
I was forked! :D 37590
37590 was forked!
Osiris-2 JL: ./xx | cat
Hello, my pid is 37594
I was forked! :D 37596
Hello, my pid is 37594
37596 was forked!
Osiris-2 JL:

Notice that when the output goes to the terminal, it is line-buffered, so the 'Hello' line appears before the fork() and there was just the one copy. When the output is piped to cat, it is fully-buffered, so nothing appears before the fork() and both processes have the 'Hello' line in the buffer to be flushed.


The reason why is that without the \n at the end of the format string the value is not immediately printed to the screen. Instead it is buffered within the process. This means it is not actually printed until after the fork operation hence you get it printed twice.

Adding the \n though forces the buffer to be flushed and outputted to the screen. This happens before the fork and hence is only printed once.

You can force this to occur by using the fflush method. For example

printf( "Hello, my pid is %d", getpid() );
fflush(stdout);

fork() effectively creates a copy of the process. If, before calling fork(), it had data that was buffered, both the parent and child will have the same buffered data. The next time that each of them does something to flush its buffer (such as printing a newline in the case of terminal output) you will see that buffered output in addition to any new output produced by that process. So if you are going to use stdio in both the parent and child then you should fflush before forking, to ensure that there is no buffered data.

Often, the child is used only to call an exec* function. Since that replaces the complete child process image (including any buffers) there is technically no need to fflush if that is really all that you are going to do in the child. However, if there may be buffered data then you should be careful in how an exec failure is handled. In particular, avoid printing the error to stdout or stderr using any stdio function (write is ok), and then call _exit (or _Exit) rather than calling exit or just returning (which will flush any buffered output). Or avoid the issue altogether by flushing before forking.