Do .bss section zero initialized variables occupy space in elf file?
Solution 1:
Has been some time since i worked with ELF. But i think i still remember this stuff. No, it does not physically contain those zeros. If you look into an ELF file program header, then you will see each header has two numbers: One is the size in the file. And another is the size as the section has when allocated in virtual memory (readelf -l ./a.out
):
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x000e0 0x000e0 R E 0x4
INTERP 0x000114 0x08048114 0x08048114 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x00454 0x00454 R E 0x1000
LOAD 0x000454 0x08049454 0x08049454 0x00104 0x61bac RW 0x1000
DYNAMIC 0x000468 0x08049468 0x08049468 0x000d0 0x000d0 RW 0x4
NOTE 0x000128 0x08048128 0x08048128 0x00020 0x00020 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
Headers of type LOAD
are the one that are copied into virtual memory when the file is loaded for execution. Other headers contain other information, like the shared libraries that are needed. As you see, the FileSize
and MemSiz
significantly differ for the header that contains the bss
section (the second LOAD
one):
0x00104 (file-size) 0x61bac (mem-size)
For this example code:
int a[100000];
int main() { }
The ELF specification says that the part of a segment that the mem-size is greater than the file-size is just filled out with zeros in virtual memory. The segment to section mapping of the second LOAD
header is like this:
03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
So there are some other sections in there too. For C++ constructor/destructors. The same thing for Java. Then it contains a copy of the .dynamic
section and other stuff useful for dynamic linking (i believe this is the place that contains the needed shared libraries among other stuff). After that the .data
section that contains initialized globals and local static variables. At the end, the .bss
section appears, which is filled by zeros at load time because file-size does not cover it.
By the way, you can see into which output-section a particular symbol is going to be placed by using the -M
linker option. For gcc, you use -Wl,-M
to put the option through to the linker. The above example shows that a
is allocated within .bss
. It may help you verify that your uninitialized objects really end up in .bss
and not somewhere else:
.bss 0x08049560 0x61aa0
[many input .o files...]
*(COMMON)
*fill* 0x08049568 0x18 00
COMMON 0x08049580 0x61a80 /tmp/cc2GT6nS.o
0x08049580 a
0x080ab000 . = ALIGN ((. != 0x0)?0x4:0x1)
0x080ab000 . = ALIGN (0x4)
0x080ab000 . = ALIGN (0x4)
0x080ab000 _end = .
GCC keeps uninitialized globals in a COMMON section by default, for compatibility with old compilers, that allow to have globals defined twice in a program without multiple definition errors. Use -fno-common
to make GCC use the .bss sections for object files (does not make a difference for the final linked executable, because as you see it's going to get into a .bss output section anyway. This is controlled by the linker script. Display it with ld -verbose
). But that shouldn't scare you, it's just an internal detail. See the manpage of gcc.
Solution 2:
The .bss
section in an ELF file is used for static data which is not initialized programmatically but guaranteed to be set to zero at runtime. Here's a little example that will explain the difference.
int main() {
static int bss_test1[100];
static int bss_test2[100] = {0};
return 0;
}
In this case bss_test1
is placed into the .bss
since it is uninitialized. bss_test2
however is placed into the .data
segment along with a bunch of zeros. The runtime loader basically allocates the amount of space reserved for the .bss
and zeroes it out before any userland code begins executing.
You can see the difference using objdump
, nm
, or similar utilities:
moozletoots$ objdump -t a.out | grep bss_test
08049780 l O .bss 00000190 bss_test1.3
080494c0 l O .data 00000190 bss_test2.4
This is usually one of the first surprises that embedded developers run into... never initialize statics to zero explicitly. The runtime loader (usually) takes care of that. As soon as you initialize anything explicitly, you are telling the compiler/linker to include the data in the executable image.