Unions and type-punning
To re-iterate, type-punning through unions is perfectly fine in C (but not in C++). In contrast, using pointer casts to do so violates C99 strict aliasing and is problematic because different types may have different alignment requirements and you could raise a SIGBUS if you do it wrong. With unions, this is never a problem.
The relevant quotes from the C standards are:
C89 section 3.3.2.3 §5:
if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined
C11 section 6.5.2.3 §3:
A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member
with the following footnote 95:
If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.
This should be perfectly clear.
James is confused because C11 section 6.7.2.1 §16 reads
The value of at most one of the members can be stored in a union object at any time.
This seems contradictory, but it is not: In contrast to C++, in C, there is no concept of active member and it's perfectly fine to access the single stored value through an expression of an incompatible type.
See also C11 annex J.1 §1:
The values of bytes that correspond to union members other than the one last stored into [are unspecified].
In C99, this used to read
The value of a union member other than the last one stored into [is unspecified]
This was incorrect. As the annex isn't normative, it did not rate its own TC and had to wait until the next standard revision to get fixed.
GNU extensions to standard C++ (and to C90) do explicitly allow type-punning with unions. Other compilers that don't support GNU extensions may also support union type-punning, but it's not part of the base language standard.
Unions original purpose was to save space when you want to be able to be able to represent different types, what we call a variant type see Boost.Variant as a good example of this.
The other common use is type punning the validity of this is debated but practically most compiler support it, we can see that gcc documents its support:
The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above works as expected.
note it says even with -fstrict-aliasing, type-punning is allowed which indicates there is an aliasing issue at play.
Pascal Cuoq has argued that defect report 283 clarified this was allowed in C. Defect report 283 added the following footnote as clarification:
If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
in C11 that would be footnote 95
.
Although in the std-discussion
mail group topic Type Punning via a Union the argument is made this is underspecified, which seems reasonable since DR 283
did not add new normative wording, just a footnote:
This is, in my opinion, an underspecified semantic quagmire in C. Consensus has not been reached between implementors and the C committee as to exactly which cases have defined behavior and which do not[...]
In C++ it is unclear whether is defined behavior or not.
This discussion also covers at least one reason why allowing type punning through a union is undesirable:
[...]the C standard's rules break the type-based alias analysis optimizations which current implementations perform.
it breaks some optimizations. The second argument against this is that using memcpy should generate identical code and is does not break optimizations and well defined behavior, for example this:
std::int64_t n;
std::memcpy(&n, &d, sizeof d);
instead of this:
union u1
{
std::int64_t n;
double d ;
} ;
u1 u ;
u.d = d ;
and we can see using godbolt this does generate identical code and the argument is made if your compiler does not generate identical code it should be considered a bug:
If this is true for your implementation, I suggest you file a bug on it. Breaking real optimizations (anything based on type-based alias analysis) in order to work around performance issues with some particular compiler seems like a bad idea to me.
The blog post Type Punning, Strict Aliasing, and Optimization also comes to a similar conclusion.
The undefined behavior mailing list discussion: Type punning to avoid copying covers a lot of the same ground and we can see how grey the territory can be.