Why would one replace default new and delete operators?

One may try to replace new and delete operators for a number of reasons, namely:

To Detect Usage Errors:

There are a number of ways in which incorrect usage of new and delete may lead to the dreaded beasts of Undefined Behavior & Memory leaks. Respective examples of each are:
Using more than one delete on newed memory & not calling delete on memory allocated using new.
An overloaded operator new can keep a list of allocated addresses and the overloaded operator delete can remove addresses from the list, then it is easy to detect such usage errors.

Similarly, a variety of programming mistakes can lead to data overruns(writing beyond the end of an allocated block) and underruns(writing prior to the beginning of an allocated block).
An Overloaded operator new can over-allocate blocks and put known byte patterns ("signatures") before and after the memory made available to clients. The overloaded operator deletes can check to see if the signatures are still intact. Thus by checking if these signatures are not intact it is possible to determine that an overrun or under-run occurred sometime during the life of the allocated block, and operator delete can log that fact, along with the value of the offending pointer, thus helping in providing a good diagnostic information.


To Improve Efficiency(speed & memory):

The new and delete operators work reasonably well for everybody, but optimally for nobody. This behavior arises from the fact that they are designed for general purpose use only. They have to accommodate allocation patterns ranging from the dynamic allocation of a few blocks that exist for the duration of the program to constant allocation and deallocation of a large number of short-lived objects. Eventually, the operator new and operator delete that ship with compilers take a middle-of-the-road strategy.

If you have a good understanding of your program's dynamic memory usage patterns, you can often find that custom versions of operator new and operator delete outperform (faster in performance, or require less memory up to 50%)the default ones. Of course, unless you are sure of what you are doing it is not a good idea to do this(don't even try this if you don't understand the intricacies involved).


To Collect Usage Statistics:

Before thinking of replacing new and delete for improving efficiency as mentioned in #2, You should gather information about how your application/program uses dynamic allocation. You may want to collect information about:
Distribution of allocation blocks,
Distribution of lifetimes,
Order of allocations(FIFO or LIFO or random),
Understanding usage patterns changes over a period of time,maximum amount of dynamic memory used etc.

Also, sometimes you may need to collect usage information such as:
Count the number of dynamically objects of a class,
Restrict the number of objects being created using dynamic allocation etc.

All, this information can be collected by replacing the custom new and delete and adding the diagnostic collection mechanism in the overloaded new and delete.


To compensate for suboptimal memory alignment in new:

Many computer architectures require that data of particular types be placed in memory at particular kinds of addresses. For example, an architecture might require that pointers occur at addresses that are a multiple of four (i.e., be four-byte aligned) or that doubles must occur at addresses that are a multiple of eight (i.e., be eight-byte aligned). Failure to follow such constraints can lead to hardware exceptions at run-time. Other architectures are more forgiving, and may allow it to work though reducing the performance.The operator new that ship with some compilers don't guarantee eight-byte alignment for dynamic allocations of doubles. In such cases, replacing the default operator new with one that guarantees eight-byte alignment could yield big increases in program performance & can be a good reason to replace new and delete operators.


To cluster related objects near one another:

If you know that particular data structures are generally used together and you'd like to minimize the frequency of page faults when working on the data, it can make sense to create a separate heap for the data structures so they are clustered together on as few pages as possible. custom Placement versions of new and delete can make it possible to achieve such clustering.


To obtain unconventional behavior:

Sometimes you want operators new and delete to do something that the compiler-provided versions don't offer.
For example: You might write a custom operator delete that overwrites deallocated memory with zeros in order to increase the security of application data.


First of all, there are really a number of different new and delete operators (an arbitrary number, really).

First, there are ::operator new, ::operator new[], ::operator delete and ::operator delete[]. Second, for any class X, there are X::operator new, X::operator new[], X::operator delete and X::operator delete[].

Between these, it's much more common to overload the class-specific operators than the global operators -- it's fairly common for the memory usage of a particular class to follow a specific enough pattern that you can write operators that provide substantial improvements over the defaults. It's generally much more difficult to predict memory usage nearly that accurately or specifically on a global basis.

It's probably also worth mentioning that although operator new and operator new[] are separate from each other (likewise for any X::operator new and X::operator new[]), there is no difference between the requirements for the two. One will be invoked to allocate a single object, and the other to allocate an array of objects, but each still just receives an amount of memory that's needed, and needs to return the address of a block of memory (at least) that large.

Speaking of requirements, it's probably worthwhile to review the other requirements1: the global operators must be truly global -- you may not put one inside a namespace or make one static in a particular translation unit. In other words, there are only two levels at which overloads can take place: a class-specific overload or a global overload. In-between points such as "all the classes in namespace X" or "all allocations in translation unit Y" are not allowed. The class-specific operators are required to be static -- but you're not actually required to declare them as static -- they will be static whether you explicitly declare them static or not. Officially, the global operators much return memory aligned so that it can be used for an object of any type. Unofficially, there's a little wiggle-room in one regard: if you get a request for a small block (e.g., 2 bytes) you only really need to provide memory aligned for an object up to that size, since attempting to store anything larger there would lead to undefined behavior anyway.

Having covered those preliminaries, let's get back to the original question about why you'd want to overload these operators. First, I should point out that the reasons for overloading the global operators tend to be substantially different from the reasons for overloading the class-specific operators.

Since it's more common, I'll talk about the class-specific operators first. The primary reason for class-specific memory management is performance. This commonly comes in either (or both) of two forms: either improving speed, or reducing fragmentation. Speed is improved by the fact that the memory manager will only deal with blocks of a particular size, so it can return the address of any free block rather than spending any time checking whether a block is large enough, splitting a block in two if it's too large, etc. Fragmentation is reduced in (mostly) the same way -- for example, pre-allocating a block large enough for N objects gives exactly the space necessary for N objects; allocating one object's worth of memory will allocate exactly the space for one object, and not a single byte more.

There's a much greater variety of reasons for overloading the global memory management operators. Many of these are oriented toward debugging or instrumentation, such as tracking the total memory needed by an application (e.g., in preparation for porting to an embedded system), or debugging memory problems by showing mismatches between allocating and freeing memory. Another common strategy is to allocate extra memory before and after the boundaries of each requested block, and writing unique patterns into those areas. At the end of execution (and possibly other times as well), those areas are examined to see if code has written outside the allocated boundaries. Yet another is to attempt to improve ease of use by automating at least some aspects of memory allocation or deletion, such as with an automated garbage collector.

A non-default global allocator can be used to improve performance as well. A typical case would be replacing a default allocator that was just slow in general (e.g., at least some versions of MS VC++ around 4.x would call the system HeapAlloc and HeapFree functions for every allocation/deletion operation). Another possibility I've seen in practice was occurred on Intel processors when using the SSE operations. These operate on 128-bit data. While the operations will work regardless of alignment, speed is improved when the data is aligned to 128-bit boundaries. Some compilers (e.g., MS VC++ again2) haven't necessarily enforced alignment to that larger boundary, so even though code using the default allocator would work, replacing the allocating could provide a substantial speed improvement for those operations.


  1. Most of the requirements are covered in §3.7.3 and §18.4 of the C++ standard (or §3.7.4 and §18.6 in C++0x, at least as of N3291).
  2. I feel obliged to point out that I don't intend to pick on Microsoft's compiler -- I doubt it has an unusual number of such problems, but I happen to use it a lot, so I tend to be quite aware of its problems.

It seems worth repeating the list from my answer from "Any reason to overload global new and delete?" here -- see that answer (or indeed other answers to that question) for a more detailed discussion, references, and other reasons. These reasons generally apply to local operator overloads as well as default/global ones, and to Cmalloc/calloc/realloc/free overloads or hooks as well.

We overload the global new and delete operators where I work for many reasons:

  • pooling all small allocations -- decreases overhead, decreases fragmentation, can increase performance for small-alloc-heavy apps
  • framing allocations with a known lifetime -- ignore all the frees until the very end of this period, then free all of them together (admittedly we do this more with local operator overloads than global)
  • alignment adjustment -- to cacheline boundaries, etc
  • alloc fill -- helping to expose usage of uninitialized variables
  • free fill -- helping to expose usage of previously deleted memory
  • delayed free -- increasing the effectiveness of free fill, occasionally increasing performance
  • sentinels or fenceposts -- helping to expose buffer overruns, underruns, and the occasional wild pointer
  • redirecting allocations -- to account for NUMA, special memory areas, or even to keep separate systems separate in memory (for e.g. embedded scripting languages or DSLs)
  • garbage collection or cleanup -- again useful for those embedded scripting languages
  • heap verification -- you can walk through the heap data structure every N allocs/frees to make sure everything looks ok
  • accounting, including leak tracking and usage snapshots/statistics (stacks, allocation ages, etc)

Many computer architectures require that data of particular types be placed in memory at particular kinds of addresses. For example, an architecture might require that pointers occur at addresses that are a multiple of four (i.e., be four-byte aligned) or that doubles must occur at addresses that are a multiple of eight (i.e., be eight-byte aligned). Failure to follow such constraints can lead to hardware exceptions at run-time. Other architectures are more forgiving, and may allow it to work though reducing the performance.

To clarify: if an architecture requires for instance that double data be eight-byte aligned, then there is nothing to optimize. Any kind of dynamic allocation of the appropriate size (e.g. malloc(size), operator new(size), operator new[](size), new char[size] where size >= sizeof(double)) is guaranteed to be properly aligned. If an implementation doesn't make this guarantee, it is not conforming. Changing operator new to do 'the right thing' in that case would be an attempt at 'fixing' the implementation, not an optimization.

On the other hand, some architectures allow different (or all) kinds of alignment for one or more data types, but provide different performance guarantees depending on alignment for those same types. An implementation may then return memory (again, assuming a request of appropriate size) that is sub-optimally aligned, and still be conforming. This is what the example is about.