Printing floating point numbers from x86-64 seems to require %rbp to be saved

When I write a simple assembly language program, linked with the C library, using gcc 4.6.1 on Ubuntu, and I try to print an integer, it works fine:

        .global main
        .text
main:
        mov     $format, %rdi
        mov     $5, %rsi
        mov     $0, %rax
        call    printf
        ret
format:
        .asciz  "%10d\n"

This prints 5, as expected.

But now if I make a small change, and try to print a floating point value:

        .global main
        .text
main:
        mov     $format, %rdi
        movsd   x, %xmm0
        mov     $1, %rax
        call    printf
        ret
format:
        .asciz  "%10.4f\n"
x:
        .double 15.5

This program seg faults without printing anything. Just a sad segfault.

But I can fix this by pushing and popping %rbp.

        .global main
        .text
main:
        push    %rbp
        mov     $format, %rdi
        movsd   x, %xmm0
        mov     $1, %rax
        call    printf
        pop     %rbp
        ret
format:
        .asciz  "%10.4f\n"
x:
        .double 15.5

Now it works, and prints 15.5000.

My question is: why did pushing and popping %rbp make the application work? According to the ABI, %rbp is one of the registers that the callee must preserve, and so printf cannot be messing it up. In fact, printf worked in the first program, when only an integer was passed to printf. So the problem must be elsewhere?

Solution 1:

I suspect the problem doesn't have anything to do with %rbp, but rather has to do with stack alignment. To quote the ABI:

The ABI requires that stack frames be aligned on 16-byte boundaries. Speciﬁcally, the end of the argument area (%rbp+16) must be a multiple of 16. This requirement means that the frame size should be padded out to a multiple of 16 bytes.

The stack is aligned when you enter main(). Calling printf() pushes the return address onto the stack, moving the stack pointer by 8 bytes. You restore the alignment by pushing another eight bytes onto the stack (which happen to be %rbp but could just as easily be something else).

Here is the code that gcc generates (also on the Godbolt compiler explorer):

.LC1:
        .ascii "%10.4f\12\0"
main:
        leaq    .LC1(%rip), %rdi   # format string address
        subq    $8, %rsp           ### align the stack by 16 before a CALL
        movl    $1, %eax           ### 1 FP arg being passed in a register to a variadic function
        movsd   .LC0(%rip), %xmm0  # load the double itself
        call    printf
        xorl    %eax, %eax         # return 0 from main
        addq    $8, %rsp
        ret

As you can see, it deals with the alignment requirements by subtracting 8 from %rsp at the start, and adding it back at the end.

You could instead do a dummy push/pop of whatever register you like instead of manipulating %rsp directly; some compilers do use a dummy push to align the stack because this can actually be cheaper on modern CPUs, and saves code size.

What are the new "for", "at", "in" keywords in Swift3 function declarations?

Why does margin-top work with inline-block but not with inline?

'this' different between REPL and script

Running lsof -i shows a lot of connections in CLOSE_WAIT ? Should I worry

find file by block number on ext3 fs on lvm

What is the difference between DocumentRoot and Directory in apache2 for mac os x [closed]

Email sent from server with rDNS & SPF being blocked by Hotmail

How to get mod_security to log all POST data?

When using thin-provisioning with ZFS, how do you make sure you don't run out of physical disk space?

ProFTP Won't Return Directory Listing

Should Production Windows Web Servers (IIS & SQL) be in a domain?

Can a VM perform better when only two cores instead of four cores are presented to it?