C undefined behavior. Strict aliasing rule, or incorrect alignment? [duplicate]

The code indeed breaks the strict aliasing rule. However, there is not only an aliasing violation, and the crash doesn't happen because of the aliasing violation. It happens because the unsigned short pointer is incorrectly aligned; even the pointer conversion itself is undefined if the result is not suitably aligned.

C11 (draft n1570) Appendix J.2:

1 The behavior is undefined in the following circumstances:

....

  • Conversion between two pointer types produces a result that is incorrectly aligned (6.3.2.3).

With 6.3.2.3p7 saying

[...] If the resulting pointer is not correctly aligned [68] for the referenced type, the behavior is undefined. [...]

unsigned short has alignment requirement of 2 on your implementation (x86-32 and x86-64), which you can test with

_Static_assert(_Alignof(unsigned short) == 2, "alignof(unsigned short) == 2");

However, you're forcing the u16 *key2 to point to an unaligned address:

u16 *key2 = (u16 *) (keyc + 1);  // we've already got undefined behaviour *here*!

There are countless programmers that insist that unaligned access is guaranteed to work in practice on x86-32 and x86-64 everywhere, and there wouldn't be any problems in practice - well, they're all wrong.

Basically what happens is that the compiler notices that

for (size_t i = 0; i < len; ++i)
     hash += key2[i];

can be executed more efficiently using the SIMD instructions if suitably aligned. The values are loaded into the SSE registers using MOVDQA, which requires that the argument is aligned to 16 bytes:

When the source or destination operand is a memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated.

For cases where the pointer is not suitably aligned at start, the compiler will generate code that will sum the first 1-7 unsigned shorts one by one, until the pointer is aligned to 16 bytes.

Of course if you start with a pointer that points to an odd address, not even adding 7 times 2 will land one to an address that is aligned to 16 bytes. Of course the compiler will not even generate code that will detect this case, as "the behaviour is undefined, if conversion between two pointer types produces a result that is incorrectly aligned" - and ignores the situation completely with unpredictable results, which here means that the operand to MOVDQA will not be properly aligned, which will then crash the program.


It can be easily proven that this can happen even without violating any strict aliasing rules. Consider the following program that consists of 2 translation units (if both f and its caller are placed into one translation unit, my GCC is smart enough to notice that we're using a packed structure here, and doesn't generate code with MOVDQA):

translation unit 1:

#include <stdlib.h>
#include <stdint.h>

size_t f(uint16_t *keyc, size_t len)
{
    size_t hash = len;
    len = len / 2;

    for (size_t i = 0; i < len; ++i)
        hash += keyc[i];
    return hash;
}

translation unit 2

#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <inttypes.h>

size_t f(uint16_t *keyc, size_t len);

struct mystruct {
    uint8_t padding;
    uint16_t contents[100];
} __attribute__ ((packed));

int main(void)
{
    struct mystruct s;
    size_t len;

    srand(time(NULL));
    scanf("%zu", &len);

    char *initializer = (char *)s.contents;
    for (size_t i = 0; i < len; i++)
       initializer[i] = rand();

    printf("out %zu\n", f(s.contents, len));
}

Now compile and link them together:

% gcc -O3 unit1.c unit2.c
% ./a.out
25
zsh: segmentation fault (core dumped)  ./a.out

Notice that there is no aliasing violation there. The only problem is the unaligned uint16_t *keyc.

With -fsanitize=undefined the following error is produced:

unit1.c:10:21: runtime error: load of misaligned address 0x7ffefc2d54f1 for type 'uint16_t', which requires 2 byte alignment
0x7ffefc2d54f1: note: pointer points here
 00 00 00  01 4e 02 c4 e9 dd b9 00  83 d9 1f 35 0e 46 0f 59  85 9b a4 d7 26 95 94 06  15 bb ca b3 c7
              ^ 

It is legal to alias a pointer to an object to a pointer to a char, and then iterate all bytes from the original object.

When a pointer to char actually points to an object (has been obtained through previous operation), it is legal to convert is back to a pointer to the original type, and the standard requires that you get back the original value.

But converting an arbitrary pointer to a char to a pointer to object and dereferencing the obtained pointer violates the strict aliasing rule and invokes undefined behaviour.

So in your code, the following line is UB:

const u16 *key2 = (const u16 *) (keyc + 1); 
// keyc + 1 did not originally pointed to a u16: UB

To provide some more info and common pitfalls to the excellent answer from @Antti Haapala:

TLDR: Access to unaligned data is undefined behavior (UB) in C/C++. Unaligned data is data at an address (aka pointer value) that is not evenly divisible by its alignment (which is usually its size). In (pseudo-)code: bool isAligned(T* ptr){ return (ptr % alignof(T)) == 0; }

This issue arises often when parsing file formats or data sent over network: You have a densely packed struct of different data types. Example would be a protocol like this: struct Packet{ uint16_t len; int32_t data[]; }; (Read as: A 16 bit length followed by len times a 32 bit int as a value). You could now do:

char* raw = receiveData();
int32_t sum = 0;
uint16_t len = *((uint16_t*)raw);
int32_t* data = (int32_t*)(raw2 + 2);
for(size_t i=0; i<len; ++i) sum += data[i];

This does not work! If you assume that raw is aligned (in your mind you could set raw = 0 which is aligned to any size as 0 % n == 0 for all n) then data cannot possibly be aligned (assuming alignment == type size): len is at address 0, so data is at address 2 and 2 % 4 != 0. But the cast tells the compiler "This data is properly aligned" ("... because otherwise it is UB and we never run into UB"). So during optimization the compiler will use SIMD/SSE instructions for faster calculation of the sum and those do crash when given unaligned data.
Sidenote: There are unaligned SSE instructions but they are slower and as the compiler assumes the alignment you promised they are not used here.

You can see this in the example from @Antti Haapala which I shortened and put at godbolt for you to play around with: https://godbolt.org/z/KOfi6V. Watch the "program returned: 255" aka "crashed".

This problem is also pretty common in deserialization routines which look like this:

char* raw = receiveData();
int32_t foo = readInt(raw); raw+=4;
bool foo = readBool(raw); raw+=1;
int16_t foo = readShort(raw); raw+=2;
...

The read* takes care of endianess and is often implemented like this:

int32_t readInt(char* ptr){
  int32_t result = *((int32_t*) ptr);
  #if BIG_ENDIAN
  result = byteswap(result);
  #endif
}

Note how this code dereferences a pointer which pointed to a smaller type which might have a different alignment and you run into the exact some problem.

This problem is so common that even Boost suffered from this through many versions. There is Boost.Endian which provides easy endian types. The C code from godbolt can be easily written likes this:

#include <cstdint>
#include <boost/endian/arithmetic.hpp>


__attribute__ ((noinline)) size_t f(boost::endian::little_uint16_t *keyc, size_t len)
{
    size_t hash = 0;
    for (size_t i = 0; i < len; ++i)
        hash += keyc[i];
    return hash;
}

struct mystruct {
    uint8_t padding;
    boost::endian::little_uint16_t contents[100];
};

int main(int argc, char** argv)
{
    mystruct s;
    size_t len = argc*25;

    for (size_t i = 0; i < len; i++)
       s.contents[i] = i * argc;

    return f(s.contents, len) != 300;
}

The type little_uint16_t is basically just some chars with an implicit conversion from/to uint16_t with a byteswap if the current machines endianess is BIG_ENDIAN. Under the hood the code used by Boost:endian was similar to this:

class little_uint16_t{
  char buffer[2];
  uint16_t value(){
    #if IS_x86
      uint16_t value = *reinterpret_cast<uint16_t*>(buffer);
    #else
    ...
    #endif
    #if BIG_ENDIAN
    swapbytes(value);
    #endif
    return value;
};

It used the knowledge that on x86 architectures unaligned access is possible. A load from an unaligned address was just a bit slower, but even on assembler level the same as the load from an aligned address.

However "possible" doesn't mean valid. If the compiler replaced the "standard" load by a SSE instruction then this fails as can be seen on godbolt. This went unnoticed for a long time because those SSE instructions are just used when processing large chunks of data with the same operation, e.g. adding an array of values which is what I did for this example. This was fixed in Boost 1.69 by using memcopy which can be translated to a "standard" load instruction in ASM which supports aligned and unaligned data on x86, so there is no slowdown compared to the cast version. But it cannot be translated into aligned SSE instructions without further checks.

Takeaway: Don't use shortcuts with casts. Be suspicious of every cast especially when casting from a smaller type and check that the alignment cannot be wrong or use the safe memcpy.