Why can't I mmap(MAP_FIXED) the highest virtual page in a 32-bit Linux process on a 64-bit kernel?
Solution 1:
The mmap function eventually calls either do_mmap or do_brk_flags which do the actual work of satisfying the memory allocation request. These functions in turn call get_unmapped_area. It is in that function that the checks are made to ensure that memory cannot be allocated beyond the user address space limit, which is defined by TASK_SIZE. I quote from the code:
* There are a few constraints that determine this:
*
* On Intel CPUs, if a SYSCALL instruction is at the highest canonical
* address, then that syscall will enter the kernel with a
* non-canonical return address, and SYSRET will explode dangerously.
* We avoid this particular problem by preventing anything executable
* from being mapped at the maximum canonical address.
*
* On AMD CPUs in the Ryzen family, there's a nasty bug in which the
* CPUs malfunction if they execute code from the highest canonical page.
* They'll speculate right off the end of the canonical space, and
* bad things happen. This is worked around in the same way as the
* Intel problem.
#define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE)
#define IA32_PAGE_OFFSET ((current->personality & ADDR_LIMIT_3GB) ? \
0xc0000000 : 0xFFFFe000)
#define TASK_SIZE (test_thread_flag(TIF_ADDR32) ? \
IA32_PAGE_OFFSET : TASK_SIZE_MAX)
On processors with 48-bit virtual address spaces, __VIRTUAL_MASK_SHIFT
is 47.
Note that TASK_SIZE
is specified depending on whether the current process is 32-bit on 32-bit, 32-bit on 64-bit, 64-bit on 64-bit. For 32-bit processes, two pages are reserved; one for the vsyscall page and the other used as a guard page. Essentially, the vsyscall page cannot be unmapped and so the highest address of the user address space is effectively 0xFFFFe000. For 64-bit processes, one guard page is reserved. These pages are only reserved on 64-bit Intel and AMD processors because only on these processors the SYSCALL
mechanism is used.
Here is the check that is performed in get_unmapped_area
:
if (addr > TASK_SIZE - len)
return -ENOMEM;