Why does the compiler reserve a little stack space but not the whole array size?
Below the stack area used by a function, there is a 128-byte red zone that is reserved for program use. Since main
calls no other function, it has no need to move the stack pointer by more than it needs, though it doesn't matter in this case. It only subtracts enough from rsp
to ensure that the array is protected by the red zone.
You can see the difference by adding a function call to main
int test() {
int arr[120];
return arr[0]+arr[119];
}
int main() {
int arr[120];
test();
return arr[0]+arr[119];
}
This gives:
test:
push rbp
mov rbp, rsp
sub rsp, 360
mov edx, DWORD PTR [rbp-480]
mov eax, DWORD PTR [rbp-4]
add eax, edx
leave
ret
main:
push rbp
mov rbp, rsp
sub rsp, 480
mov eax, 0
call test
mov edx, DWORD PTR [rbp-480]
mov eax, DWORD PTR [rbp-4]
add eax, edx
leave
ret
You can see that the main
function subtracts by 480 because it needs the array to be in its stack space, but test doesn't need to because it doesn't call any functions.
The additional usage of array elements does not significantly change the output, but it was added to make it clear that it's not pretending that those elements don't exist.
You're on x86-64 Linux, where the ABI includes a red-zone (128 bytes below RSP). https://stackoverflow.com/tags/red-zone/info.
So the array goes from the bottom of the red-zone up to near the top of what gcc reserved. Compile with -mno-red-zone
to see different code-gen.
Also, your compiler is using RSP, not ESP. ESP is the low 32 bits of RSP, and x86-64 normally has RSP outside the low 32 bits so it would crash if you truncated RSP to 32 bits.
On the Godbolt compiler explorer, I get this from gcc -O3
(with gcc 6.3, 7.3, and 8.1):
main:
sub rsp, 368
mov eax, DWORD PTR [rsp-120] # -128, not -480 which would be outside the red-zone
add rsp, 368
ret
Did you fake your asm output, or does some other version of gcc or some other compiler really load from outside the red-zone on this undefined behaviour (reading an uninitialized array element)? clang just compiles it to ret
, and ICC just returns 0 without loading anything. (Isn't undefined behaviour fun?)
int ext(int*);
int foo() {
int arr[120]; // can't use the red-zone because of later non-inline function call
ext(arr);
return arr[0];
}
# gcc. clang and ICC are similar.
sub rsp, 488
mov rdi, rsp
call ext
mov eax, DWORD PTR [rsp]
add rsp, 488
ret
But we can avoid UB in a leaf function without letting the compiler optimize away the store/reload. (We could maybe just use volatile
instead of inline asm).
int bar() {
int arr[120];
asm("nop # operand was %0" :"=m" (arr[0]) ); // tell the compiler we write arr[0]
return arr[0];
}
# gcc output
bar:
sub rsp, 368
nop # operand was DWORD PTR [rsp-120]
mov eax, DWORD PTR [rsp-120]
add rsp, 368
ret
Note that the compiler only assumes we wrote arr[0], not any of arr[1..119]
.
But anyway, gcc/clang/ICC all put the bottom of the array in the red-zone. See the Godbolt link.
This is a good thing in general: more of the array is within range of a disp8
from RSP, so reference to arr[0]
up to arr[63
or so could use [rsp+disp8]
instead of [rsp+disp32]
addressing modes. Not super useful for one big array, but as a general algorithm for allocating locals on the stack it makes total sense. (gcc doesn't go all the way to the bottom of the red-zone for arr, but clang does, using sub rsp, 360
instead of 368 so the array is still 16-byte aligned. (IIRC, the x86-64 System V ABI at least recommends this for arrays with automatic storage with size >= 16 bytes.)