Why is the .bss segment required?
The reason is to reduce program size. Imagine that your C program runs on an embedded system, where the code and all constants are saved in true ROM (flash memory). In such systems, an initial "copy-down" must be executed to set all static storage duration objects, before main() is called. It will typically go like this pseudo:
for(i=0; i<all_explicitly_initialized_objects; i++)
{
.data[i] = init_value[i];
}
memset(.bss,
0,
all_implicitly_initialized_objects);
Where .data and .bss are stored in RAM, but init_value is stored in ROM. If it had been one segment, then the ROM had to be filled up with a lot of zeroes, increasing ROM size significantly.
RAM-based executables work similarly, though of course they have no true ROM.
Also, memset is likely some very efficient inline assembler, meaning that the startup copy-down can be executed faster.
The .bss
segment is an optimization. The entire .bss
segment is described by a single number, probably 4 bytes or 8 bytes, that gives its size in the running process, whereas the .data
section is as big as the sum of sizes of the initialized variables. Thus, the .bss
makes the executables smaller and quicker to load. Otherwise, the variables could be in the .data
segment with explicit initialization to zeroes; the program would be hard-pressed to tell the difference. (In detail, the address of the objects in .bss
would probably be different from the address if it was in the .data
segment.)
In the first program, a
would be in the .data
segment and b
would be in the .bss
segment of the executable. Once the program is loaded, the distinction becomes immaterial. At run time, b
occupies 20 * sizeof(int)
bytes.
In the second program, var
is allocated space and the assignment in main()
modifies that space. It so happens that the space for var
was described in the .bss
segment rather than the .data
segment, but that doesn't affect the way the program behaves when running.
From Assembly Language Step-by-Step: Programming with Linux by Jeff Duntemann, regarding the .data section:
The .data section contains data definitions of initialized data items. Initialized data is data that has a value before the program begins running. These values are part of the executable file. They are loaded into memory when the executable file is loaded into memory for execution.
The important thing to remember about the .data section is that the more initialized data items you define, the larger the executable file will be, and the longer it will take to load it from disk into memory when you run it.
and the .bss section:
Not all data items need to have values before the program begins running. When you’re reading data from a disk file, for example, you need to have a place for the data to go after it comes in from disk. Data buffers like that are defined in the .bss section of your program. You set aside some number of bytes for a buffer and give the buffer a name, but you don’t say what values are to be present in the buffer.
There’s a crucial difference between data items defined in the .data section and data items defined in the .bss section: data items in the .data section add to the size of your executable file. Data items in the .bss section do not. A buffer that takes up 16,000 bytes (or more, sometimes much more) can be defined in .bss and add almost nothing (about 50 bytes for the description) to the executable file size.
Well, first of all, those variables in your example aren't uninitialized; C specifies that static variables not otherwise initialized are initialized to 0.
So the reason for .bss is to have smaller executables, saving space and allowing faster loading of the program, as the loader can just allocate a bunch of zeroes instead of having to copy the data from disk.
When running the program, the program loader will load .data and .bss into memory. Writes into objects residing in .data or .bss thus only go to memory, they are not flushed to the binary on disk at any point.
The System V ABI 4.1 (1997) (AKA ELF specification) also contains the answer:
.bss
This section holds uninitialized data that contribute to the program’s memory image. By definition, the system initializes the data with zeros when the program begins to run. The section occupies no file space, as indicated by the section type,SHT_NOBITS
.
says that the section name .bss
is reserved and has special effects, in particular it occupies no file space, thus the advantage over .data
.
The downside is of course that all bytes must be set to 0
when the OS puts them on memory, which is more restrictive, but a common use case, and works fine for uninitialized variables.
The SHT_NOBITS
section type documentation repeats that affirmation:
sh_size
This member gives the section’s size in bytes. Unless the section type isSHT_NOBITS
, the section occupiessh_size
bytes in the file. A section of typeSHT_NOBITS
may have a non-zero size, but it occupies no space in the file.
The C standard says nothing about sections, but we can easily verify where the variable is stored in Linux with objdump
and readelf
, and conclude that uninitialized globals are in fact stored in the .bss
. See for example this answer: What happens to a declared, uninitialized variable in C?