Why does optimisation kill this function?

Solution 1:

In C++, pointer arguments are assumed not to alias (except char*) if they point to fundamentally different types ("strict aliasing" rules). This allows some optimizations.

Here, u64 tmp is never modified as u64.
A content of u32* is modified but may be unrelated to 'u64 tmp' so may be seen as nop for u64 tmp.

Solution 2:

This code violates the strict aliasing rules which makes it illegal to access an object through a pointer of a different type, although access through a *char ** is allowed. The compiler is allowed to assume that pointers of different types do not point to the same memory and optimize accordingly. It also means the code invokes undefined behavior and could really do anything.

One of the best references for this topic is Understanding Strict Aliasing and we can see the first example is in a similar vein to the OP's code:

uint32_t swap_words( uint32_t arg )
{
  uint16_t* const sp = (uint16_t*)&arg;
  uint16_t        hi = sp[0];
  uint16_t        lo = sp[1];

  sp[1] = hi;
  sp[0] = lo;

 return (arg);
} 

The article explains this code violates strict aliasing rules since sp is an alias of arg but they have different types and says that although it will compile, it is likely arg will be unchanged after swap_words returns. Although with simple tests, I am unable to reproduce that result with either the code above nor the OPs code but that does not mean anything since this is undefined behavior and therefore not predictable.

The article goes on to talk about many different cases and presents several working solution including type-punning through a union, which is well-defined in C991 and may be undefined in C++ but in practice is supported by most major compilers, for example here is gcc's reference on type-punning. The previous thread Purpose of Unions in C and C++ goes into the gory details. Although there are many threads on this topic, this seems to do the best job.

The code for that solution is as follows:

typedef union
{
  uint32_t u32;
  uint16_t u16[2];
} U32;

uint32_t swap_words( uint32_t arg )
{
  U32      in;
  uint16_t lo;
  uint16_t hi;

  in.u32    = arg;
  hi        = in.u16[0];
  lo        = in.u16[1];
  in.u16[0] = lo;
  in.u16[1] = hi;

  return (in.u32);
}

For reference the relevant section from the C99 draft standard on strict aliasing is 6.5 Expressions paragraph 7 which says:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:76)

— a type compatible with the effective type of the object,

— a qualified version of a type compatible with the effective type of the object,

— a type that is the signed or unsigned type corresponding to the effective type of the object,

— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

— a character type.

and footnote 76 says:

The intent of this list is to specify those circumstances in which an object may or may not be aliased.

and the relevant section from the C++ draft standard is 3.10 Lvalues and rvalues paragraph 10

The article Type-punning and strict-aliasing gives a gentler but less complete introduction to the topic and C99 revisited gives a deep analysis of C99 and aliasing and is not light reading. This answer to Accessing inactive union member - undefined? goes over the muddy details of type-punning through a union in C++ and is not light reading either.


Footnotes:

  1. Quoting comment by Pascal Cuoq: [...]C99 that was initially clumsily worded, appearing to make type-punning through unions undefined. In reality, type-punning though unions is legal in C89, legal in C11, and it was legal in C99 all along although it took until 2004 for the committee to fix incorrect wording, and the subsequent release of TC3. open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm

Solution 3:

g++ (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1:

> g++ -Wall -std=c++11 -O0 -o sample sample.cpp

> g++ -Wall -std=c++11 -O3 -o sample sample.cpp
sample.cpp: In function ‘uint64_t Swap_64(uint64_t)’:
sample.cpp:10:19: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     (*(uint32_t*)&tmp)       = Swap_32(*(((uint32_t*)&x)+1));
                   ^
sample.cpp:11:54: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     (*(((uint32_t*)&tmp)+1)) = Swap_32(*(uint32_t*) &x);
                                                      ^

Clang 3.4 doesn't warn in any optimization level, which is curious...