In C++11, does `i += ++i + 1` exhibit undefined behavior?
Solution 1:
About the description of i = ++i + 1
I gather that the subtle explanation is that
(1) the expression
++i
returns an lvalue but+
takes prvalues as operands, so a conversion from lvalue to prvalue must be performed;
Probably, see CWG active issue 1642.
this involves obtaining the current value of that lvalue (rather than one more than the old value of
i
) and must therefore be sequenced after the side effect from the increment (i.e., updatingi
)
The sequencing here is defined for the increment (indirectly, via +=
, see (a)):
The side effect of ++
(the modification of i
) is sequenced before the value computation of the whole expression ++i
. The latter refers to computing the result of ++i
, not to loading the value of i
.
(2) the LHS of the assignment is also an lvalue, so its value evaluation does not involve fetching the current value of
i
; while this value computation is unsequenced w.r.t. the value computation of the RHS, this poses no problem
I don't think that's properly defined in the Standard, but I'd agree.
(3) the value computation of the assignment itself involves updating
i
(again),
The value computation of i = expr
is only required when you use the result, e.g. int x = (i = expr);
or (i = expr) = 42;
. The value computation itself does not modify i
.
The modification of i
in the expression i = expr
that happens because of the =
is called the side effect of =
. This side effect is sequenced before value computation of i = expr
-- or rather the value computation of i = expr
is sequenced after the side effect of the assignment in i = expr
.
In general, the value computation of the operands of an expression are sequenced before the side effect of that expression, of course.
but is sequenced after the value computation of its RHS, and hence after the previous update to
i
; no problem.
The side effect of the assignment i = expr
is sequenced after the value computation of the operands i
(A) and expr
of the assignment.
The expr
in this case is a +
-expression: expr1 + 1
. The value computation of this expression is sequenced after the value computations of its operands expr1
and 1
.
The expr1
here is ++i
. The value computation of ++i
is sequenced after the side effect of ++i
(the modification of i
) (B)
That's why i = ++i + 1
is safe: There's a chain of sequenced before between the value computation in (A) and the side effect on the same variable in (B).
(a) The Standard defines ++expr
in terms of expr += 1
, which is defined as expr = expr + 1
with expr
being evaluated only once.
For this expr = expr + 1
, we therefore have only one value computation of expr
. The side effect of =
is sequenced before the value computation of the whole expr = expr + 1
, and it's sequenced after the value computation of the operands expr
(LHS) and expr + 1
(RHS).
This corresponds to my claim that for ++expr
, the side effect is sequenced before the value computation of ++expr
.
About i += ++i + 1
Does the value computation of
i += ++i + 1
involve undefined behavior?Since the LHS of
+=
is still an lvalue (and its RHS still a prvalue), the same reasoning as above applies as far as (1) and (2) are concerned; as for (3) the value computation of the+=
operator now must both fetch the current value ofi
, and then (obviously sequenced after it, even if the standard does not say so explicitly, or otherwise the execution of such operators would always invoke undefined behavior) perform the addition of the RHS and store the result back intoi
.
I think here's the problem: The addition of i
in the LHS of i +=
to the result of ++i + 1
requires knowing the value of i
- a value computation (which can mean loading the value of i
). This value computation is unsequenced with respect to the modification performed by ++i
. This is essentially what you say in your alternative description, following the rewrite mandated by the Standard i += expr
-> i = i + expr
. Here, the value computation of i
within i + expr
is unsequenced with respect to the value computation of expr
. That's where you get UB.
Please note that a value computation can have two results: The "address" of an object, or the value of an object. In an expression i = 42
, the value computation of the lhs "produces the address" of i
; that is, the compiler needs to figure out where to store the rhs (under the rules of observable behaviour of the abstract machine). In an expression i + 42
, the value computation of i
produces the value. In the above paragraph, I was referring to the second kind, hence [intro.execution]p15 applies:
If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
Another approach for i += ++i + 1
the value computation of the
+=
operator now must both fetch the current value ofi
, and then [...] perform the addition of the RHS
The RHS being ++i + 1
. Computing the result of this expression (the value computation) is unsequenced with respect to the value computation of i
from the LHS. So the word then in this sentence is misleading: Of course, it must first load i
and then add the result of the RHS to it. But there's no order between the side-effect of the RHS and the value computation to get the value of the LHS. For example, you could get for the LHS either the old or the new value of i
, as modified by the RHS.
In general a store and a "concurrent" load is a data race, which leads to Undefined Behaviour.
Addressing the addendum
using a fictive
|||
operator to designate unsequenced evaluations, one might try to defineE op= F;
(with int operands for simplicity) as equivalent to{ int& L=E ||| int R=F; L = L + R; }
, but then the example no longer has UB.
Let E
be i
and F
be ++i
(we don't need the + 1
). Then, for i = ++i
int* lhs_address;
int lhs_value;
int* rhs_address;
int rhs_value;
( lhs_address = &i)
||| (i = i+1, rhs_address = &i, rhs_value = *rhs_address);
*lhs_address = rhs_value;
On the other hand, for i += ++i
( lhs_address = &i, lhs_value = *lhs_address)
||| (i = i+1, rhs_address = &i, rhs_value = *rhs_address);
int total_value = lhs_value + rhs_value;
*lhs_address = total_value;
This is intended to represent my understanding of the sequencing guarantees. Note that the ,
operator sequences all value computations and side effects of the LHS before those of the RHS. Parentheses do not affect sequencing. In the second case, i += ++i
, we have a modification of i
unsequenced wrt an lvalue-to-rvalue conversion of i
=> UB.
The standard does not treat compound assignments as second-class primitives for which no separate definition of semantics is necessary.
I would say that's a redundancy. The rewrite from E1 op = E2
to E1 = E1 op E2
also includes which expression types and value categories are required (on the rhs, 5.17/1 says something about the lhs), what happens to pointer types, the required conversions etc. The sad thing is that the sentence about "With respect to an.." in 5.17/1 is not in 5.17/7 as an exception of that equivalence.
In any way, I think we should compare the guarantees and requirements for compound assignment vs. simple assignment plus the operator, and see if there's any contradiction.
Once we put that "With respect to an.." also in the list of exceptions in 5.17/7, I don't think there's a contradiction.
As it turns out, as you can see in the discussion of Marc van Leeuwen's answer, this sentence leads to the following interesting observation:
int i; // global
int& f() { return ++i; }
int main() {
i = i + f(); // (A)
i += f(); // (B)
}
It seems that (A) has an two possible outcomes, since the evaluation of the body of f
is indeterminately sequenced with the value computation of the i
in i + f()
.
In (B), on the other hand, the evaluation of the body of f()
is sequenced before the value computation of i
, since +=
must be seen as a single operation, and f()
certainly needs to be evaluated before the assignment of +=
.
Solution 2:
The expression:
i += ++i + 1
does invoke undefined behavior. The language lawyer method requires us to go back to the defect report that results in:
i = ++i + 1 ;
becoming well defined in C++11, which is defect report 637. Sequencing rules and example disagree , it starts out saying:
In 1.9 [intro.execution] paragraph 16, the following expression is still listed as an example of undefined behavior:
i = ++i + 1;
However, it appears that the new sequencing rules make this expression well-defined
The logic used in the report is as follows:
The assignment side-effect is required to be sequenced after the value computations of both its LHS and RHS (5.17 [expr.ass] paragraph 1).
The LHS (i) is an lvalue, so its value computation involves computing the address of i.
In order to value-compute the RHS (++i + 1), it is necessary to first value-compute the lvalue expression ++i and then do an lvalue-to-rvalue conversion on the result. This guarantees that the incrementation side-effect is sequenced before the computation of the addition operation, which in turn is sequenced before the assignment side effect. In other words, it yields a well-defined order and final value for this expression.
So in this question our problem changes the RHS
which goes from:
++i + 1
to:
i + ++i + 1
due to draft C++11 standard section 5.17
Assignment and compound assignment operators which says:
The behavior of an expression of the form E1 op = E2 is equivalent to E1 = E1 op E2 except that E1 is evaluated only once. [...]
So now we have a situation where the computation of i
in the RHS
is not sequenced relative to ++i
and so we then have undefined behavior. This follows from section 1.9
paragraph 15 which says:
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced. [ Note: In an expression that is evaluated more than once during the execution of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations. —end note ] The value computations of the operands of an operator are sequenced before the value computation of the result of the operator. If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
The pragmatic way to show this would be to use clang
to test the code, which generates the following warning (see it live):
warning: unsequenced modification and access to 'i' [-Wunsequenced]
i += ++i + 1 ;
~~ ^
for this code:
int main()
{
int i = 0 ;
i += ++i + 1 ;
}
This is further bolstered by this explicit test example in clang's
test suite for -Wunsequenced:
a += ++a;
Solution 3:
Yes, it is UB!
The evaluation of your expression
i += ++i + 1
proceeds in the following steps:
5.17p1 (C++11) states (emphases mine):
The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand and return an lvalue referring to the left operand. The result in all cases is a bit-field if the left operand is a bit-field. In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.
What does "value computation" mean?
1.9p12 gives the answer:
Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression (or a sub-expression) in general includes both value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and initiation of side effects.
Since your code uses a compound assignment operator, 5.17p7 tells us, how this operator behaves:
The behavior of an expression of the form
E1 op= E2
is equivalent toE1 = E1 op E2 except that
E1 is evaluated only once.
Hence the evaluation of the expression E1 ( == i)
involves both, determining the identity of the object designated by i
and an lvalue-to-rvalue conversion to fetch the value stored in that object. But the evaluation of the two operands E1
and E2
are not sequenced with respect to each other. Thus we get undefined behavior since the evaluation of E2 ( == ++i + 1)
initiates a side effect (updating i
).
1.9p15:
... If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
The following statements in your question/comments seem to be the root of your misunderstanding:
(2) the LHS of the assignment is also an lvalue, so its value evaluation does not involve fetching the current value of
i
fetching a value can be part of a prvalue evaluation. But in E += F the only prvalue is F so fetching the value of E is not part of the evaluation of the (lvalue) subexpression E
If an expression is an lvalue or rvalue doesn't tell anything about how this expression is to be evaluated. Some operators require lvalues as their operands some others require rvalues.
Clause 5p8:
Whenever a glvalue expression appears as an operand of an operator that expects a prvalue for that operand, the lvalue-to-rvalue (4.1), array-to-pointer (4.2), or function-to-pointer (4.3) standard conversions are applied to convert the expression to a prvalue.
In a simple assignment the evaluation of of the LHS only requires determining the identity of the object. But in a compound assignment such as +=
the LHS must be a modifiable lvalue, but the evaluation of the LHS in this case consists of determining the identity of the object and an lvalue-to-rvalue conversion. It is the result of this conversion (which is a prvalue) that is added to the result (also a prvalue) of the evaluation of the RHS.
"But in E += F the only prvalue is F so fetching the value of E is not part of the evaluation of the (lvalue) subexpression E"
That's not true as I explained above. In your example F
is a prvalue expression, but F
may as well be an lvalue expression. In that case, the lvalue-to-rvalue conversion is also applied to F
. 5.17p7 as cited above tells us, what the semantics of the compound assignment operators are. The standard states that the behavior of E += F
is the same as of E = E + F
but E
is only evaluated once. Here, the evaluation of E
includes the lvalue-to-rvalue conversion, because the binary operator +
requires it operands to be rvalues.