Linux process stack overrun by local variables (stack guarding)
Solution 1:
_chkstk
does stack probes to make sure each page is touched in order after a (potentially) large allocation, e.g. an alloca. Because Windows will only grow the stack one page at a time up to the stack size limit.
Touching that "guard page" triggers stack growth. It doesn't guard against stack overflow; I think you're misinterpreting the meaning of "guard page" in this usage.
The function name is also potentially misleading. _chkstk
docs simply say: Called by the compiler when you have more than one page of local variables in your function. It doesn't truly check anything, it just makes sure that intervening pages have been touched before memory around esp
/rsp
gets used. i.e. the only possible effects are: nothing (possibly including a valid soft page fault) or an invalid page-fault on stack overflow (trying to touch a page that Windows refused to grow the stack to include.) It ensures that the stack pages are allocated by unconditionally writing them.
I guess you could look at this as checking for a stack clash by making sure you touch an unmappable page before continuing in the case of stack overflow.
Linux will grow the main-thread stack1 by any number of pages (up to the stack size limit set by ulimit -s
; default 8MiB) when you touch memory below old stack pages if it's above the current stack pointer.
If you touch memory outside the growth limit, or don't move the stack pointer first, it will just segfault. Thus Linux doesn't need stack probes, merely to move the stack pointer by as many bytes as you want to reserve. Compilers know this and emit code accordingly.
See also How is Stack memory allocated when using 'push' or 'sub' x86 instructions? for more low-level details on what the Linux kernel does, and what glibc pthreads on Linux does.
A sufficiently large alloca
on Linux can move the stack all the way past the bottom of the stack growth region, beyond the guard pages below that, and into another mapping; this is a Stack Clash. https://blog.qualys.com/securitylabs/2017/06/19/the-stack-clash It of course requires that the program uses a potentially-huge size for alloca, dependent on user input. The mitigation for CVE-2017-1000364 is to leave a 1MiB guard region, requiring a much larger alloca than normal to get past the guard pages.
This 1MiB guard region is below the ulimit -s
(8MiB) growth limit, not below the current stack pointer. It's separate from Linux's normal stack growth mechanism.
gcc -fstack-check
The effect of gcc -fstack-check
is essentially the same as what's always needed on Windows (which MSVC does by calling _chkstk
): touch stack pages in between previous and new stack pointer when moving it by a large or runtime-variable amount.
But the purpose / benefit of these probes is different on Linux; it's never needed for correctness in a bug-free program on GNU/Linux. It "only" defends against stack-clash bugs/exploits.
On x86-64 GNU/Linux, gcc -fstack-check
will (for functions with a VLA or large fixe-size array) add a loop that does stack probes with or qword ptr [rsp], 0
along with sub rsp,4096
. For known fixed array sizes, it can be just a single probe. The code-gen doesn't look very efficient; it's normally never used on this target. (Godbolt compiler explorer example that passes a stack array to a non-inline function.)
https://gcc.gnu.org/onlinedocs/gccint/Stack-Checking.html describes some GCC internal parameters that control what -fstack-check
does.
If you want absolute safety against stack-clash attacks, this should do it. It's not needed for normal operation, though, and a 1MiB guard page is enough for most people.
Note that -fstack-protector-strong
is completely different, and guards against overwrite of the return address by buffer overruns on local arrays. Nothing to do with stack clashes, and the attack is against stuff already on the stack above a small local array, not against other regions of memory by moving the stack a lot.
Footnote 1: Thread stacks on Linux (for threads other than the initial one) have to be fully allocated up front because the magic growth feature doesn't work. Only the initial aka main thread of a process can have that.
(There's an mmap(MAP_GROWSDOWN)
feature but it's not safe because there's no limit, and because nothing stops other dynamic allocations from randomly picking a page close below the current stack, limiting future growth to a tiny size before a stack clash. Also because it only grows if you touch the guard page, so it would need stack probes. For these showstopper reasons, MAP_GROWSDOWN
is not used for thread stacks. The internal mechanism for the main stack relies on different magic in the kernel which does prevent other allocations from stealing space.)