Is there any guarantee of alignment of address return by C++'s new operation?

Most of experienced programmer knows data alignment is important for program's performance. I have seen some programmer wrote program that allocate bigger size of buffer than they need, and use the aligned pointer as begin. I am wondering should I do that in my program, I have no idea is there any guarantee of alignment of address returned by C++'s new operation. So I wrote a little program to test

for(size_t i = 0; i < 100; ++i) {
    char *p = new char[123];
    if(reinterpret_cast<size_t>(p) % 4) {
        cout << "*";
        system("pause");
    }
    cout << reinterpret_cast<void *>(p) << endl;
}
for(size_t i = 0; i < 100; ++i) {
    short *p = new short[123];
    if(reinterpret_cast<size_t>(p) % 4) {
        cout << "*";
        system("pause");
    }
    cout << reinterpret_cast<void *>(p) << endl;
}
for(size_t i = 0; i < 100; ++i) {
    float *p = new float[123];
    if(reinterpret_cast<size_t>(p) % 4) {
        cout << "*";
        system("pause");
    }
    cout << reinterpret_cast<void *>(p) << endl;
}
system("pause");

The compiler I am using is Visual C++ Express 2008. It seems that all addresses the new operation returned are aligned. But I am not sure. So my question is: are there any guarantee? If they do have guarantee, I don't have to align myself, if not, I have to.


The alignment has the following guarantee from the standard (3.7.3.1/2):

The pointer returned shall be suitably aligned so that it can be converted to a pointer of any complete object type and then used to access the object or array in the storage allocated (until the storage is explicitly deallocated by a call to a corresponding deallocation function).

EDIT: Thanks to timday for highlighting a bug in gcc/glibc where the guarantee does not hold.

EDIT 2: Ben's comment highlights an intersting edge case. The requirements on the allocation routines are for those provided by the standard only. If the application has it's own version, then there's no such guarantee on the result.


This is a late answer but just to clarify the situation on Linux - on 64-bit systems memory is always 16-byte aligned:

http://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html

The address of a block returned by malloc or realloc in the GNU system is always a multiple of eight (or sixteen on 64-bit systems).

The new operator calls malloc internally (see ./gcc/libstdc++-v3/libsupc++/new_op.cc) so this applies to new as well.

The implementation of malloc which is part of the glibc basically defines MALLOC_ALIGNMENT to be 2*sizeof(size_t) and size_t is 32bit=4byte and 64bit=8byte on a x86-32 and x86-64 system, respectively.

$ cat ./glibc-2.14/malloc/malloc.c:
...
#ifndef INTERNAL_SIZE_T
#define INTERNAL_SIZE_T size_t
#endif
...
#define SIZE_SZ                (sizeof(INTERNAL_SIZE_T))
...
#ifndef MALLOC_ALIGNMENT
#define MALLOC_ALIGNMENT       (2 * SIZE_SZ)
#endif

C++17 changes the requirements on the new allocator, such that it is required to return a pointer whose alignment is equal to the macro __STDCPP_DEFAULT_NEW_ALIGNMENT__ (which is defined by the implementation, not by including a header).

This is important because this size can be larger than alignof(std::max_align_t). In Visual C++ for example, the maximum regular alignment is 8-byte, but the default new always returns 16-byte aligned memory.

Also, note that if you override the default new with your own allocator, you are required to abide by the __STDCPP_DEFAULT_NEW_ALIGNMENT__ as well.


Incidentally the MS documentation mentions something about malloc/new returning addresses which are 16-byte aligned, but from experimentation this is not the case. I happened to need the 16-byte alignment for a project (to speed up memory copies with enhanced instruction set), in the end I resorted to writing my own allocator...


The platform's new/new[] operator will return pointers with sufficient alignment so that it'll perform good with basic datatypes (double,float,etc.). At least any sensible C++ compiler+runtime should do that.

If you have special alignment requirements like for SSE, then it's probably a good idea use special aligned_malloc functions, or roll your own.