Why is f(i = -1, i = -1) undefined behavior?

I was reading about order of evaluation violations, and they give an example that puzzles me.

1) If a side effect on a scalar object is un-sequenced relative to another side effect on the same scalar object, the behavior is undefined.

// snip
f(i = -1, i = -1); // undefined behavior

In this context, i is a scalar object, which apparently means

Arithmetic types (3.9.1), enumeration types, pointer types, pointer to member types (3.9.2), std::nullptr_t, and cv-qualified versions of these types (3.9.3) are collectively called scalar types.

I don’t see how the statement is ambiguous in that case. It seems to me that regardless of if the first or second argument is evaluated first, i ends up as -1, and both arguments are also -1.

Can someone please clarify?


UPDATE

I really appreciate all the discussion. So far, I like @harmic’s answer a lot since it exposes the pitfalls and intricacies of defining this statement in spite of how straight forward it looks at first glance. @acheong87 points out some issues that come up when using references, but I think that's orthogonal to the unsequenced side effects aspect of this question.


SUMMARY

Since this question got a ton of attention, I will summarize the main points/answers. First, allow me a small digression to point out that "why" can have closely related yet subtly different meanings, namely "for what cause", "for what reason", and "for what purpose". I will group the answers by which of those meanings of "why" they addressed.

for what cause

The main answer here comes from Paul Draper, with Martin J contributing a similar but not as extensive answer. Paul Draper's answer boils down to

It is undefined behavior because it is not defined what the behavior is.

The answer is overall very good in terms of explaining what the C++ standard says. It also addresses some related cases of UB such as f(++i, ++i); and f(i=1, i=-1);. In the first of the related cases, it's not clear if the first argument should be i+1 and the second i+2 or vice versa; in the second, it's not clear if i should be 1 or -1 after the function call. Both of these cases are UB because they fall under the following rule:

If a side effect on a scalar object is unsequenced relative to another side effect on the same scalar object, the behavior is undefined.

Therefore, f(i=-1, i=-1) is also UB since it falls under the same rule, despite that the intention of the programmer is (IMHO) obvious and unambiguous.

Paul Draper also makes it explicit in his conclusion that

Could it have been defined behavior? Yes. Was it defined? No.

which brings us to the question of "for what reason/purpose was f(i=-1, i=-1) left as undefined behavior?"

for what reason / purpose

Although there are some oversights (maybe careless) in the C++ standard, many omissions are well-reasoned and serve a specific purpose. Although I am aware that the purpose is often either "make the compiler-writer's job easier", or "faster code", I was mainly interested to know if there is a good reason leave f(i=-1, i=-1) as UB.

harmic and supercat provide the main answers that provide a reason for the UB. Harmic points out that an optimizing compiler that might break up the ostensibly atomic assignment operations into multiple machine instructions, and that it might further interleave those instructions for optimal speed. This could lead to some very surprising results: i ends up as -2 in his scenario! Thus, harmic demonstrates how assigning the same value to a variable more than once can have ill effects if the operations are unsequenced.

supercat provides a related exposition of the pitfalls of trying to get f(i=-1, i=-1) to do what it looks like it ought to do. He points out that on some architectures, there are hard restrictions against multiple simultaneous writes to the same memory address. A compiler could have a hard time catching this if we were dealing with something less trivial than f(i=-1, i=-1).

davidf also provides an example of interleaving instructions very similar to harmic's.

Although each of harmic's, supercat's and davidf' examples are somewhat contrived, taken together they still serve to provide a tangible reason why f(i=-1, i=-1) should be undefined behavior.

I accepted harmic's answer because it did the best job of addressing all meanings of why, even though Paul Draper's answer addressed the "for what cause" portion better.

other answers

JohnB points out that if we consider overloaded assignment operators (instead of just plain scalars), then we can run into trouble as well.


Solution 1:

Since the operations are unsequenced, there is nothing to say that the instructions performing the assignment cannot be interleaved. It might be optimal to do so, depending on CPU architecture. The referenced page states this:

If A is not sequenced before B and B is not sequenced before A, then two possibilities exist:

  • evaluations of A and B are unsequenced: they may be performed in any order and may overlap (within a single thread of execution, the compiler may interleave the CPU instructions that comprise A and B)

  • evaluations of A and B are indeterminately-sequenced: they may be performed in any order but may not overlap: either A will be complete before B, or B will be complete before A. The order may be the opposite the next time the same expression is evaluated.

That by itself doesn't seem like it would cause a problem - assuming that the operation being performed is storing the value -1 into a memory location. But there is also nothing to say that the compiler cannot optimize that into a separate set of instructions that has the same effect, but which could fail if the operation was interleaved with another operation on the same memory location.

For example, imagine that it was more efficient to zero the memory, then decrement it, compared with loading the value -1 in. Then this:

f(i=-1, i=-1)

might become:

clear i
clear i
decr i
decr i

Now i is -2.

It is probably a bogus example, but it is possible.

Solution 2:

First, "scalar object" means a type like a int, float, or a pointer (see What is a scalar Object in C++?).


Second, it may seem more obvious that

f(++i, ++i);

would have undefined behavior. But

f(i = -1, i = -1);

is less obvious.

A slightly different example:

int i;
f(i = 1, i = -1);
std::cout << i << "\n";

What assignment happened "last", i = 1, or i = -1? It's not defined in the standard. Really, that means i could be 5 (see harmic's answer for a completely plausible explanation for how this chould be the case). Or you program could segfault. Or reformat your hard drive.

But now you ask: "What about my example? I used the same value (-1) for both assignments. What could possibly be unclear about that?"

You are correct...except in the way the C++ standards committee described this.

If a side effect on a scalar object is unsequenced relative to another side effect on the same scalar object, the behavior is undefined.

They could have made a special exception for your special case, but they didn't. (And why should they? What use would that ever possibly have?) So, i could still be 5. Or your hard drive could be empty. Thus the answer to your question is:

It is undefined behavior because it is not defined what the behavior is.

(This deserves emphasis because many programmers think "undefined" means "random", or "unpredictable". It doesn't; it means not defined by the standard. The behavior could be 100% consistent, and still be undefined.)

Could it have been defined behavior? Yes. Was it defined? No. Hence, it is "undefined".

That said, "undefined" doesn't mean that a compiler will format your hard drive...it means that it could and it would still be a standards-compliant compiler. Realistically, I'm sure g++, Clang, and MSVC will all do what you expected. They just wouldn't "have to".


A different question might be Why did the C++ standards committee choose to make this side-effect unsequenced?. That answer will involve history and opinions of the committee. Or What is good about having this side-effect unsequenced in C++?, which permits any justification, whether or not it was the actual reasoning of the standards committee. You could ask those questions here, or at programmers.stackexchange.com.

Solution 3:

A practical reason to not make an exception from the rules just because the two values are the same:

// config.h
#define VALUEA  1

// defaults.h
#define VALUEB  1

// prog.cpp
f(i = VALUEA, i = VALUEB);

Consider the case this was allowed.

Now, some months later, the need arises to change

 #define VALUEB 2

Seemingly harmless, isn't it? And yet suddenly prog.cpp wouldn't compile anymore. Yet, we feel that compilation should not depend on the value of a literal.

Bottom line: there is no exception to the rule because it would make successful compilation depend on the value (rather the type) of a constant.

EDIT

@HeartWare pointed out that constant expressions of the form A DIV B are not allowed in some languages, when B is 0, and cause compilation to fail. Hence changing of a constant could cause compilation errors in some other place. Which is, IMHO, unfortunate. But it is certainly good to restrict such things to the unavoidable.