Memory Leak Detectors Working Principle
Solution 1:
There are a couple of different ways that leak detectors work. You can replace the implementation of malloc
and free
with ones that can track more information during allocation and are not concerned with performance. This is similar to how dmalloc
works. In general, any address that is malloc
'ed but not free
'd is leaked.
The basic implementation is actually pretty simple. You just maintain a lookup table of every allocation and its line number, and remove the entry when it is freed. Then when the program is done you can list all leaked memory. The hard part is determining when and where the allocation should have been freed. This is even harder when there are multiple pointers to the same address.
In practice, you'll probably want more than just the single line number, but rather a stack trace for the lost allocations.
Another approach is how valgrind works which implements an entire virtual machine to keep track of addresses and memory references and associated bookkeeping. The valgrind approach is much more expensive, but also much more effective as it can also tell you about other types of memory errors like out of bounds reads or writes.
Valgrind essentially instruments the underlying instructions and can track when a given memory address has no more references. It can do this by tracking assignments of addresses, and so it can tell you not just that a piece of memory was lost, but exactly when it became lost.
C++ makes things a little harder for both types of leak detectors because it adds the new
and delete
operators. Technically new
can be a completely different source of memory than malloc
. However, in practice many real C++ implementations just use malloc
to implement new
or have an option to use malloc
instead of the alternate approach.
Also higher level languages like C++ tend to have alternative higher level ways of allocating memory like std::vector
or std::list
. A basic leak detector would report the potentially many allocations made by the higher level modes separately. That's much less useful than saying the entire container was lost.
Solution 2:
Here's a published technical paper on how our CheckPointer tool works.
Fundamentally it tracks the lifetimes of all values (heap and stack), and their sizes according their types as defined by the language. This allows CheckPointer to find not only leaks, but out-of-array bound accesses, even for arrays in the stack, which valgrind won't do.
In particular, it analyzes the source code to find all pointer uses. (This is quite the task just by itself).
It keeps track of pointer meta data for each pointer, consisting of
- A reference to the object meta data for the heap-allocated object or global or local variable orfunction pointed to by the pointer and
- The address range of the (sub)object of the object that the pointer may currently access. This may be smaller than the address range of the whole object; e.g. if you take the address of a struct member, the instrumented source code will only allow access to that member when using the resulting pointer.
It also tracks the kind and location of each object, i.e. whether it is a function, a global, thread-local or local variable, heap-allocated memory, or a string literal constant:
- The address range of the object that may be safely accessed, and
- For each pointer stored in the heap-allocated object or variable, a reference to the pointer metadata for that pointer.
All this tracking is accomplished by transforming the original program source, into a program which does what the original program does, and interleaves various meta-data checking or updating routines. The resulting program is compiled and run. Where a meta-data check fails at runtime, a backtrace is provided with a report of the type of failure (invalid pointer, pointer outside valid bounds, ...)