gfortran for dummies: What does mcmodel=medium do exactly?
Solution 1:
Since bar
is quite large the compiler generates static allocation instead of automatic allocation on the stack. Static arrays are created with the .comm
assembly directive which creates an allocation in the so-called COMMON section. Symbols from that section are gathered, same-named symbols are merged (reduced to one symbol request with size equal to the largest size requested) and then what is rest is mapped to the BSS (uninitialised data) section in most executable formats. With ELF executables the .bss
section is located in the data segment, just before the data segment part of the heap (there is another heap part managed by anonymous memory mappings which does not reside in the data segment).
With the small
memory model 32-bit addressing instructions are used to address symbols on x86_64. This makes code smaller and also faster. Some assembly output when using small
memory model:
movl $bar.1535, %ebx <---- Instruction length saving
...
movl %eax, baz_+4(%rip) <---- Problem!!
...
.local bar.1535
.comm bar.1535,2575411200,32
...
.comm baz_,12,16
This uses a 32-bit move instruction (5 bytes long) to put the value of the bar.1535
symbol (this value equals to the address of the symbol location) into the lower 32 bits of the RBX
register (the upper 32 bits get zeroed). The bar.1535
symbol itself is allocated using the .comm
directive. Memory for the baz
COMMON block is allocated afterwards. Because bar.1535
is very large, baz_
ends up more than 2 GiB from the start of the .bss
section. This poses a problem in the second movl
instruction since a non-32bit (signed) offset from RIP
should be used to address the b
variable where the value of EAX
has to be moved into. This is only detected during link time. The assembler itself does not know the appropriate offset since it doesn't know what the value of the instruction pointer (RIP
) would be (it depends on the absolute virtual address where the code is loaded and this is determined by the linker), so it simply puts an offset of 0
and then creates a relocation request of type R_X86_64_PC32
. It instructs the linker to patch the value of 0
with the real offset value. But it cannot do that since the offset value would not fit inside a signed 32-bit integer and hence bails out.
With the medium
memory model in place things look like this:
movabsq $bar.1535, %r10
...
movl %eax, baz_+4(%rip)
...
.local bar.1535
.largecomm bar.1535,2575411200,32
...
.comm baz_,12,16
First a 64-bit immediate move instruction (10 bytes long) is used to put the 64-bit value which represents the address of bar.1535
into register R10
. Memory for the bar.1535
symbol is allocated using the .largecomm
directive and thus it ends in the .lbss
section of the ELF exectuable. .lbss
is used to store symbols which might not fit in the first 2 GiB (and hence should not be addressed using 32-bit instructions or RIP-relative addressing), while smaller things go to .bss
(baz_
is still allocated using .comm
and not .largecomm
). Since the .lbss
section is placed after the .bss
section in the ELF linker script, baz_
would not end up being inaccessible using 32-bit RIP-related addressing.
All addressing modes are described in the System V ABI: AMD64 Architecture Processor Supplement. It is a heavy technical reading but a must read for anybody who really wants to understand how 64-bit code works on most x86_64 Unixes.
When an ALLOCATABLE
array is used instead, gfortran
allocates heap memory (most likely implemented as an anonymous memory map given the large size of the allocation):
movl $2575411200, %edi
...
call malloc
movq %rax, %rdi
This is basically RDI = malloc(2575411200)
. From then on elements of bar
are accessed by using positive offsets from the value stored in RDI
:
movl 51190040(%rdi), %eax
movl %eax, baz_+4(%rip)
For locations that are more than 2 GiB from the start of bar
, a more elaborate method is used. E.g. to implement b = bar(12,144*144*450)
gfortran
emits:
; Some computations that leave the offset in RAX
movl (%rdi,%rax), %eax
movl %eax, baz_+4(%rip)
This code is not affected by the memory model since nothing is assumed about the address where the dynamic allocation would be made. Also, since the array is not passed around, no descriptor is being built. If you add another function that takes an assumed-shaped array and pass bar
to it, a descriptor for bar
is created as an automatic variable (i.e. on the stack of foo
). If the array is made static with the SAVE
attribute, the descriptor is placed in the .bss
section:
movl $bar.1580, %edi
...
; RAX still holds the address of the allocated memory as returned by malloc
; Computations, computations
movl -232(%rax,%rdx,4), %eax
movl %eax, baz_+4(%rip)
The first move prepares the argument of a function call (in my sample case call boo(bar)
where boo
has an interface that declares it as taking an assumed-shape array). It moves the address of the array descriptor of bar
into EDI
. This is a 32-bit immediate move so the descriptor is expected to be in the first 2 GiB. Indeed, it is allocated in the .bss
in both small
and medium
memory models like this:
.local bar.1580
.comm bar.1580,72,32
Solution 2:
No, large static arrays (as your bar
) may exceed the limit if you do not use -mcmodel=medium
. But allocatables are better of course. For allocatables only the array descriptor must fit into 2 GB, not the whole array.
From GCC reference:
-mcmodel=small
Generate code for the small code model: the program and its symbols must be linked in the lower 2 GB of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model.
-mcmodel=kernel
Generate code for the kernel code model. The kernel runs in the negative 2 GB of the address space. This model has to be used for Linux kernel code.
-mcmodel=medium
Generate code for the medium model: The program is linked in the lower 2 GB of the address space but symbols can be located anywhere in the address space. Programs can be statically or dynamically linked, but building of shared libraries are not supported with the medium model.
-mcmodel=large
Generate code for the large model: This model makes no assumptions about addresses and sizes of sections. Currently GCC does not implement this model.