Is a^a or a-a undefined behaviour if a is not initialized?
Consider this program:
#include <stdio.h>
int main(void)
{
unsigned int a;
printf("%u %u\n", a^a, a-a);
return 0;
}
Is it undefined behaviour?
On the face of it, a
is an uninitialized variable. So that points to undefined behaviour. But a^a
and a-a
are equal to 0
for all values of a
, at least I think that is the case. Is it possible that there is some way to argue that the behaviour is well defined?
Solution 1:
In C11:
- It's explicitly undefined according to 6.3.2.1/2 if
a
never has its address taken (quoted below) - It could be a trap representation (which causes UB when accessed). 6.2.6.1/5:
Certain object representations need not represent a value of the object type.
Unsigned ints can have trap representations (e.g. if it has 15 precision bits and 1 parity bit, accessing a
could cause a parity fault).
6.2.4/6 says that the initial value is indeterminate and the definition of that under 3.19.2 is either an unspecified value or a trap representation.
Further: in C11 6.3.2.1/2, as pointed out by Pascal Cuoq:
If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.
This doesn't have the exception for character types, so this clause appears to supersede the preceding discussion; accessing x
is immediately undefined even if no trap representations exist. This clause was added to C11 to support Itanium CPUs which do actually have a trap state for registers.
Systems without trap representations: But what if we throw in &x;
so that that 6.3.2.1/2's objection no longer applies, and we are on a system that is known to have no trap representations? Then the value is an unspecified value.
The definition of unspecified value in 3.19.3 is a bit vague, however it is clarified by DR 451, which concludes:
- An uninitialized value under the conditions described can appear to change its value.
- Any operation performed on indeterminate values will have an indeterminate value as a result.
- Library functions will exhibit undefined behavior when used on indeterminate values.
- These answers are appropriate for all types that do not have trap representations.
Under this resolution, int a; &a; int b = a - a;
results in b
having indeterminate value still.
Note that if the indeterminate value is not passed to a library function, we are still in the realm of unspecified behaviour (not undefined behaviour). The results may be weird, e.g. if ( j != j ) foo();
could call foo, but the demons must remain ensconced in the nasal cavity.
Solution 2:
Yes, it is undefined behavior.
Firstly, any uninitialized variable can have "broken" (aka "trap") representation. Even a single attempt to access that representation triggers undefined behavior. Moreover, even objects of non-trapping types (like unsigned char
) can still acquire special platform-dependent states (like NaT - Not-A-Thing - on Itanium) that might appear as a manifestation of their "indeterminate value".
Secondly, an uninitialized variable is not guaranteed to have a stable value. Two sequential accesses to the same uninitialized variable can read completely different values, which is why, even if both accesses in a - a
are "successful" (not trapping), it is still not guaranteed that a - a
will evaluate to zero.
Solution 3:
If an object has automatic storage duration and its address is not taken, attempting to read it will yield Undefined Behavior. Taking the address of such an object and using pointers of type "unsigned char" to read out the bytes thereof it is guaranteed by the Standard to yield a value of type "unsigned char", but not all compilers adhere to the Standard in that regard. ARM GCC 5.1, for example, when given:
#include <stdint.h>
#include <string.h>
struct q { uint16_t x,y; };
volatile uint16_t zz;
int32_t foo(uint32_t x, uint32_t y)
{
struct q temp1,temp2;
temp1.x = 3;
if (y & 1)
temp1.y = zz;
memmove(&temp2,&temp1,sizeof temp1);
return temp2.y;
}
will generate code that will return x if y is zero, even if x is outside the range 0-65535. The Standard makes clear that unsigned character reads of Indeterminate Value are guaranteed to yield a value within the range of unsigned char
, and the behavior of memmove
is defined as equivalent to a sequence of character reads and writes. Thus, temp2 should have a value that could be stored into it via sequence of character writes, but gcc is deciding to replace the memmove with an assignment and ignore the fact that code took the address of temp1 and temp2.
Having a means of forcing a compiler to regard a variable as holding a arbitrary value of its type, in cases where any such value would be equally acceptable, would be helpful, but the Standard doesn't specify a clean means of doing so (save for storing some particular value which would work, but often be needlessly slow). Even operations which should logically force a variable to hold a value that would be representable as some combination of bits cannot be relied upon to work on all compilers. Consequently, nothing useful can be guaranteed about such variables.