Why are these constructs using pre and post-increment undefined behavior?
#include <stdio.h>
int main(void)
{
int i = 0;
i = i++ + ++i;
printf("%d\n", i); // 3
i = 1;
i = (i++);
printf("%d\n", i); // 2 Should be 1, no ?
volatile int u = 0;
u = u++ + ++u;
printf("%d\n", u); // 1
u = 1;
u = (u++);
printf("%d\n", u); // 2 Should also be one, no ?
register int v = 0;
v = v++ + ++v;
printf("%d\n", v); // 3 (Should be the same as u ?)
int w = 0;
printf("%d %d\n", ++w, w); // shouldn't this print 1 1
int x[2] = { 5, 8 }, y = 0;
x[y] = y ++;
printf("%d %d\n", x[0], x[1]); // shouldn't this print 0 8? or 5 0?
}
Solution 1:
C has the concept of undefined behavior, i.e. some language constructs are syntactically valid but you can't predict the behavior when the code is run.
As far as I know, the standard doesn't explicitly say why the concept of undefined behavior exists. In my mind, it's simply because the language designers wanted there to be some leeway in the semantics, instead of i.e. requiring that all implementations handle integer overflow in the exact same way, which would very likely impose serious performance costs, they just left the behavior undefined so that if you write code that causes integer overflow, anything can happen.
So, with that in mind, why are these "issues"? The language clearly says that certain things lead to undefined behavior. There is no problem, there is no "should" involved. If the undefined behavior changes when one of the involved variables is declared volatile
, that doesn't prove or change anything. It is undefined; you cannot reason about the behavior.
Your most interesting-looking example, the one with
u = (u++);
is a text-book example of undefined behavior (see Wikipedia's entry on sequence points).
Solution 2:
Most of the answers here quoted from C standard emphasizing that the behavior of these constructs are undefined. To understand why the behavior of these constructs are undefined, let's understand these terms first in the light of C11 standard:
Sequenced: (5.1.2.3)
Given any two evaluations
A
andB
, ifA
is sequenced beforeB
, then the execution ofA
shall precede the execution ofB
.
Unsequenced:
If
A
is not sequenced before or afterB
, thenA
andB
are unsequenced.
Evaluations can be one of two things:
- value computations, which work out the result of an expression; and
- side effects, which are modifications of objects.
Sequence Point:
The presence of a sequence point between the evaluation of expressions
A
andB
implies that every value computation and side effect associated withA
is sequenced before every value computation and side effect associated withB
.
Now coming to the question, for the expressions like
int i = 1;
i = i++;
standard says that:
6.5 Expressions:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. [...]
Therefore, the above expression invokes UB because two side effects on the same object i
is unsequenced relative to each other. That means it is not sequenced whether the side effect by assignment to i
will be done before or after the side effect by ++
.
Depending on whether assignment occurs before or after the increment, different results will be produced and that's the one of the case of undefined behavior.
Lets rename the i
at left of assignment be il
and at the right of assignment (in the expression i++
) be ir
, then the expression be like
il = ir++ // Note that suffix l and r are used for the sake of clarity.
// Both il and ir represents the same object.
An important point regarding Postfix ++
operator is that:
just because the
++
comes after the variable does not mean that the increment happens late. The increment can happen as early as the compiler likes as long as the compiler ensures that the original value is used.
It means the expression il = ir++
could be evaluated either as
temp = ir; // i = 1
ir = ir + 1; // i = 2 side effect by ++ before assignment
il = temp; // i = 1 result is 1
or
temp = ir; // i = 1
il = temp; // i = 1 side effect by assignment before ++
ir = ir + 1; // i = 2 result is 2
resulting in two different results 1
and 2
which depends on the sequence of side effects by assignment and ++
and hence invokes UB.
Solution 3:
I think the relevant parts of the C99 standard are 6.5 Expressions, §2
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
and 6.5.16 Assignment operators, §4:
The order of evaluation of the operands is unspecified. If an attempt is made to modify the result of an assignment operator or to access it after the next sequence point, the behavior is undefined.
Solution 4:
Just compile and disassemble your line of code, if you are so inclined to know how exactly it is you get what you are getting.
This is what I get on my machine, together with what I think is going on:
$ cat evil.c
void evil(){
int i = 0;
i+= i++ + ++i;
}
$ gcc evil.c -c -o evil.bin
$ gdb evil.bin
(gdb) disassemble evil
Dump of assembler code for function evil:
0x00000000 <+0>: push %ebp
0x00000001 <+1>: mov %esp,%ebp
0x00000003 <+3>: sub $0x10,%esp
0x00000006 <+6>: movl $0x0,-0x4(%ebp) // i = 0 i = 0
0x0000000d <+13>: addl $0x1,-0x4(%ebp) // i++ i = 1
0x00000011 <+17>: mov -0x4(%ebp),%eax // j = i i = 1 j = 1
0x00000014 <+20>: add %eax,%eax // j += j i = 1 j = 2
0x00000016 <+22>: add %eax,-0x4(%ebp) // i += j i = 3
0x00000019 <+25>: addl $0x1,-0x4(%ebp) // i++ i = 4
0x0000001d <+29>: leave
0x0000001e <+30>: ret
End of assembler dump.
(I... suppose that the 0x00000014 instruction was some kind of compiler optimization?)
Solution 5:
The behavior can't really be explained because it invokes both unspecified behavior and undefined behavior, so we can not make any general predictions about this code, although if you read Olve Maudal's work such as Deep C and Unspecified and Undefined sometimes you can make good guesses in very specific cases with a specific compiler and environment but please don't do that anywhere near production.
So moving on to unspecified behavior, in draft c99 standard section6.5
paragraph 3 says(emphasis mine):
The grouping of operators and operands is indicated by the syntax.74) Except as specified later (for the function-call (), &&, ||, ?:, and comma operators), the order of evaluation of subexpressions and the order in which side effects take place are both unspecified.
So when we have a line like this:
i = i++ + ++i;
we do not know whether i++
or ++i
will be evaluated first. This is mainly to give the compiler better options for optimization.
We also have undefined behavior here as well since the program is modifying variables(i
, u
, etc..) more than once between sequence points. From draft standard section 6.5
paragraph 2(emphasis mine):
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
it cites the following code examples as being undefined:
i = ++i + 1;
a[i++] = i;
In all these examples the code is attempting to modify an object more than once in the same sequence point, which will end with the ;
in each one of these cases:
i = i++ + ++i;
^ ^ ^
i = (i++);
^ ^
u = u++ + ++u;
^ ^ ^
u = (u++);
^ ^
v = v++ + ++v;
^ ^ ^
Unspecified behavior is defined in the draft c99 standard in section 3.4.4
as:
use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance
and undefined behavior is defined in section 3.4.3
as:
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
and notes that:
Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).