Why do we need stack allocation when we have a red zone?

I have the following doubts:

As we know System V x86-64 ABI gives us about a fixed-size area (128 bytes) in the stack frame, so called redzone. So, as a result we don't need to use, for example, sub rsp, 12. Just make mov [rsp-12], X and that's all.

But I cannot grasp idea of that. Why does it matter? Is it necessary to sub rsp, 12 without redzone? After all, stack size is limited at the beginning so why sub rsp, 12 is important? I know that it makes possible us to follow the top of the stack but let's ignore it at that moment.

I know what some instructions use rsp value ( like ret) but don't care about it in that moment.

The crux of the problem is: We have no redzone and we've done:

function:
    mov [rsp-16], rcx
    mov [rsp-32], rcx
    mov [rsp-128], rcx
    mov [rsp-1024], rcx
    ret

Is it difference with?

function:
    sub rsp, 1024
    mov [rsp-16], rcx
    mov [rsp-32], rcx
    mov [rsp-128], rcx
    mov [rsp-1024], rcx
    add rsp, 1024
    ret

Solution 1:

The "red zone" is not strictly necessary. In your terms, it could be considered "pointless." Everything that you could do using the red zone, you could also do the traditional way that you did it targeting the IA-32 ABI.

Here's what the AMD64 ABI says about the "red zone":

The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers. Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as the red zone.

The real purpose of the red zone is as an optimization. Its existence allows code to assume that the 128 bytes below rsp will not be asynchronously clobbered by signals or interrupt handlers, which makes it possible to use it as scratch space. This makes it unnecessary to explicitly create scratch space on the stack by moving the stack pointer in rsp. This is an optimization because the instructions to decrement and restore rsp can now be elided, saving time and space.

So yes, while you could do this with AMD64 (and would need to do it with IA-32):

function:
    push rbp                      ; standard "prologue" to save the
    mov  rbp, rsp                 ;   original value of rsp

    sub  rsp, 32                  ; reserve scratch area on stack
    mov  QWORD PTR [rsp],   rcx   ; copy rcx into our scratch area
    mov  QWORD PTR [rsp+8], rdx   ; copy rdx into our scratch area

    ; ...do something that clobbers rcx and rdx...

    mov  rcx, [rsp]               ; retrieve original value of rcx from our scratch area
    mov  rdx, [rsp+8]             ; retrieve original value of rdx from our scratch area
    add  rsp, 32                  ; give back the stack space we used as scratch area

    pop  rbp                      ; standard "epilogue" to restore rsp
    ret

we don't need to do it in cases where we only need a 128-byte scratch area (or smaller), because then we can use the red zone as our scratch area.

Plus, since we no longer have to decrement the stack pointer, we can use rsp as the base pointer (instead of rbp), making it unnecessary to save and restore rbp (in the prologue and epilogue), and also freeing up rbp for use as another general-purpose register!

(Technically, turning on frame-pointer omission (-fomit-frame-pointer, enabled by default with -O1 since the ABI allows it) would also make it possible for the compiler to elide the prologue and epilogue sections, with the same benefits. However, absent a red zone, the need to adjust the stack pointer to reserve space would not change.)

Note, however, that the ABI only guarantees that asynchronous things like signals and interrupt handlers not modify the red zone. Calls to other functions may clobber values in the red zone, so it is not particularly useful except in leaf functions (which those functions that do not call any other functions, as if they were at the "leaf" of a function-call tree).


A final point: the Windows x64 ABI deviates slightly from the AMD64 ABI used on other operating systems. In particular, it has no concept of a "red zone". The area beyond rsp is considered volatile and subject to be overwritten at any time. Instead, it requires that the caller allocate a home address space on the stack, which is then available for the callee's use in the event that it needs to spill any of the register-passed parameters.

Solution 2:

You have the offsets the wrong way around in your example, which is why it does not make sense. Code should not access the region below the stack pointer - it is undefined. The red-zone is there to protect the first 128 bytes below the stack pointer. Your second example should read:

function:
    sub rsp, 1024
    mov [rsp+16], rcx
    mov [rsp+32], rcx
    mov [rsp+128], rcx
    mov [rsp+1016], rcx
    add rsp, 1024
    ret

If the amount of scratch space that a function needs is up to 128 bytes then it can use addresses below the stack pointer without needing to adjust the stack: this is the optimisation. Compare:

function:        // Not using red-zone.
    sub rsp, 128
    mov [rsp+120], rcx
    add rsp, 128
    ret

With the same code using the red-zone:

function:        // Using the red-zone, no adjustment of stack
    mov [rsp-8], rcx
    ret

The confusion about the offsets from the stack pointer is normally caused because compilers generate negative offsets from the frame (RBP), not positive offsets from the stack (RSP).