Will malloc implementations return free-ed memory back to the system?
I have a long-living application with frequent memory allocation-deallocation. Will any malloc
implementation return freed memory back to the system?
What is, in this respect, the behavior of:
- ptmalloc 1, 2 (glibc default) or 3
- dlmalloc
- tcmalloc (google threaded malloc)
- solaris 10-11 default malloc and mtmalloc
- FreeBSD 8 default malloc (jemalloc)
- Hoard malloc?
Update
If I have an application whose memory consumption can be very different in daytime and nighttime (e.g.), can I force any of malloc
's to return freed memory to the system?
Without such return freed memory will be swapped out and in many times, but such memory contains only garbage.
The following analysis applies only to glibc (based on the ptmalloc2
algorithm).
There are certain options that seem helpful to return the freed memory back to the system:
-
mallopt() (defined in
malloc.h
) does provide an option to set the trim threshold value using one of the parameter optionM_TRIM_THRESHOLD
, this indicates the minimum amount of free memory (in bytes) allowed at the top of the data segment. If the amount falls below this threshold, glibc invokesbrk()
to give back memory to the kernel.The default value of
M_TRIM_THRESHOLD
in Linux is set to 128K, setting a smaller value might save space.The same behavior could be achieved by setting trim threshold value in the environment variable
MALLOC_TRIM_THRESHOLD_
, with no source changes absolutely.However, preliminary test programs run using
M_TRIM_THRESHOLD
has shown that even though the memory allocated bymalloc
does return to the system, the remaining portion of the actual chunk of memory (the arena) initially requested viabrk()
tends to be retained. -
It is possible to trim the memory arena and give any unused memory back to the system by calling
malloc_trim(pad)
(defined inmalloc.h
). This function resizes the data segment, leaving at leastpad
bytes at the end of it and failing if less than one page worth of bytes can be freed. Segment size is always a multiple of one page, which is 4,096 bytes on i386.The implementation of this modified behavior of
free()
usingmalloc_trim
could be done using the malloc hook functionality. This would not require any source code changes to the core glibc library. -
Using
madvise()
system call inside the free implementation ofglibc
.
Most implementations don't bother identifying those (relatively rare) cases where entire "blocks" (of whatever size suits the OS) have been freed and could be returned, but there are of course exceptions. For example, and I quote from the wikipedia page, in OpenBSD:
On a call to
free
, memory is released and unmapped from the process address space using munmap. This system is designed to improve security by taking advantage of the address space layout randomization and gap page features implemented as part of OpenBSD'smmap
system call, and to detect use-after-free bugs—as a large memory allocation is completely unmapped after it is freed, further use causes a segmentation fault and termination of the program.
Most systems are not as security-focused as OpenBSD, though.
Knowing this, when I'm coding a long-running system that has a known-to-be-transitory requirement for a large amount of memory, I always try to fork
the process: the parent then just waits for results from the child [[typically on a pipe]], the child does the computation (including memory allocation), returns the results [[on said pipe]], then terminates. This way, my long-running process won't be uselessly hogging memory during the long times between occasional "spikes" in its demand for memory. Other alternative strategies include switching to a custom memory allocator for such special requirements (C++ makes it reasonably easy, though languages with virtual machines underneath such as Java and Python typically don't).
I had a similar problem in my app, after some investigation I noticed that for some reason glibc does not return memory to the system when allocated objects are small (in my case less than 120 bytes).
Look at this code:
#include <list>
#include <malloc.h>
template<size_t s> class x{char x[s];};
int main(int argc,char** argv){
typedef x<100> X;
std::list<X> lx;
for(size_t i = 0; i < 500000;++i){
lx.push_back(X());
}
lx.clear();
malloc_stats();
return 0;
}
Program output:
Arena 0:
system bytes = 64069632
in use bytes = 0
Total (incl. mmap):
system bytes = 64069632
in use bytes = 0
max mmap regions = 0
max mmap bytes = 0
about 64 MB are not return to system. When I changed typedef to:
typedef x<110> X;
program output looks like this:
Arena 0:
system bytes = 135168
in use bytes = 0
Total (incl. mmap):
system bytes = 135168
in use bytes = 0
max mmap regions = 0
max mmap bytes = 0
almost all memory was freed. I also noticed that using malloc_trim(0)
in either case released memory to system.
Here is output after adding malloc_trim
to the code above:
Arena 0:
system bytes = 4096
in use bytes = 0
Total (incl. mmap):
system bytes = 4096
in use bytes = 0
max mmap regions = 0
max mmap bytes = 0
I am dealing with the same problem as the OP. So far, it seems possible with tcmalloc. I found two solutions:
-
compile your program with tcmalloc linked, then launch it as :
env TCMALLOC_RELEASE=100 ./my_pthread_soft
the documentation mentions that
Reasonable rates are in the range [0,10].
but 10 doesn't seem enough for me (i.e I see no change).
-
find somewhere in your code where it would be interesting to release all the freed memory, and then add this code:
#include "google/malloc_extension_c.h" // C include #include "google/malloc_extension.h" // C++ include /* ... */ MallocExtension_ReleaseFreeMemory();
The second solution has been very effective in my case; the first would be great but it isn't very successful, it is complicated to find the right number for example.
Of the ones you list, only Hoard will return memory to the system... but if it can actually do that will depend a lot on your program's allocation behaviour.