Kernel zeroes memory?

Solution 1:

On any modern operating system, the only way newly obtained memory will contain nonzero values is if memory previously freed by your program got reused by malloc. When new memory is obtained from the operating system (kernel), it is initially purely virtual. It has no physical existence; instead it is mapped as copy-on-write mappings of a single shared memory page that's full of 0 bytes. The first time you attempt to write to it, the kernel will trap the write, allocate a new page of physical memory, copy the contents of the original page (which in this case are all 0 bytes) to the new page, and then resume your program. If the kernel knows the newly allocated physical memory is already zero-filled, it might even be able to optimize out the copy step.

This procedure is both necessary and efficient. It's necessary because handing over memory that might contain private data from the kernel or another user's processes to your process would be a critical security breach. It's efficient because no zeroing is performed at allocation time; the "zero-filled" pages are just reference to a shared zero page.

Solution 2:

From what I read in Linux Kernel Development, the kernel does zero pages because it may contain kernel data that a user program could interpret and some way gain access to the system.

malloc asks the kernel for more pages, so the kernel is responsible for that memory that you are receiving.

Solution 3:

The first time you malloc a chunk memory there's a fair chance it will be zero because memory allocated by a system call (sbrk, mmap) is zeroed by the kernel. But if you free and malloc again the memory is recycled and may not contain zero.

Solution 4:

You'll find that memory is zerored on most operating systems that have isolation between processes. The reason is that a process must not be allowed to peek at the memory released by another process, so a memory page must be erased between the time it's freed by some process and the time when it's released by another process. In practice, erased means zeroed, and the memory is usually zeroed at the time it's allocated by the process.

When you call malloc in your toy program, the memory hasn't been used for anything else yet. So it's still fresh from the kernel, full of zeros. If you try in a real program that's already allocated and freed a lot of heap blocks, you'll find that memory that's already been used by your process still contains whatever garbage you (or the memory management system) may have put there.

Solution 5:

As already illustrated, the key difference is first time allocation vs. allocation. If you try:

char *a, tst;
do {
    a = malloc(50000000);
    a[49999999] = '\0';
    printf("%50s\n%p", a, a); // it outputs nothing 1st, but bbbb.... 2nd
    tst = a[5000]
    memset(a, 'b', 50000000);
    free(a);
} while (tst == '\0');

it'll print you two lines (most likely, at least if the pointers are the same).

Key is that the memory block returned by malloc() has undefined contents. It may or may not be zeroes, and depends on how memory allocation has been done in the past by the program (or what memory debugging facilities are used).

If you want to guarantee contents, you need calloc() or explicit initialization after allocation.

The system's integrity / data separation guarantee on the other hand means that any initial address space requested by the system - whether via sbrk() or mmap(MAP_ANON) - must be zero-initialized, as any other contents of such would consist of a security breach.