What is the behavior of printing NULL with printf's %s specifier?
Came across an interesting interview question:
test 1:
printf("test %s\n", NULL);
printf("test %s\n", NULL);
prints:
test (null)
test (null)
test 2:
printf("%s\n", NULL);
printf("%s\n", NULL);
prints
Segmentation fault (core dumped)
Though this might run fine on some systems, atleast mine is throwing a segmentation fault. What would be the best explanation of this behavior? Above code is in C.
Following is my gcc info:
deep@deep:~$ gcc --version
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
First things first: printf
is expecting a valid (i.e. non-NULL)
pointer for its %s argument so passing it a NULL is officially
undefined. It may print "(null)" or it may delete all files on your
hard drive--either is correct behavior as far as ANSI is concerned
(at least, that's what Harbison and Steele tells me.)
That being said, yeah, this is really wierd behavior. It turns out
that what's happening is that when you do a simple printf
like this:
printf("%s\n", NULL);
gcc is (ahem) smart enough to deconstruct this into a call to
puts
. The first printf
, this:
printf("test %s\n", NULL);
is complicated enough that gcc will instead emit a call to real
printf
.
(Notice that gcc emits warnings about your invalid printf
argument
when you compile. That's because it long ago developed the ability to
parse *printf
format strings.)
You can see this yourself by compiling with the -save-temps
option
and then looking through the resulting .s
file.
When I compiled the first example, I got:
movl $.LC0, %eax
movl $0, %esi
movq %rax, %rdi
movl $0, %eax
call printf ; <-- Actually calls printf!
(Comments were added by me.)
But the second one produced this code:
movl $0, %edi ; Stores NULL in the puts argument list
call puts ; Calls puts
The wierd thing is that it doesn't print the following newline. It's as though it's figured out that this is going to cause a segfault so it doesn't bother. (Which it has--it warned me when I compiled it.)
As far as the C language is concerned, the reason is that you're invoking undefined behavior and anything can happen.
As for the mechanics of why this is happening, modern gcc optimizes printf("%s\n", x)
to puts(x)
, and puts
does not have the silly code to print (null)
when it sees a null pointer, whereas common implementations of printf
have this special case. Since gcc can't optimize (in general) non-trivial format strings like this, printf
actually gets called when the format string has other text present in it.