For { A=a; B=b; }, will "A=a" be strictly executed before "B=b"?

Both standards allow for these instructions to be performed out of order, so long as that does not change observable behaviour. This is known as the as-if rule:

  • What exactly is the "as-if" rule?
  • http://en.cppreference.com/w/cpp/language/as_if

Note that as is pointed out in the comments, what is meant by "observable behaviour" is the observable behaviour of a program with defined behaviour. If your program has undefined behaviour, then the compiler is excused from reasoning about that.


The compiler is only obligated to emulate the observable behavior of a program, so if a re-ordering would not violate that principle then it would be allowed. Assuming the behavior is well defined, if your program contains undefined behavior such as a data race then the behavior of the program will be unpredictable and as commented would require use of some form of synchronization to protect the critical section.

A Useful reference

An interesting article that covers this is Memory Ordering at Compile Time and it says:

The cardinal rule of memory reordering, which is universally followed by compiler developers and CPU vendors, could be phrased as follows:

Thou shalt not modify the behavior of a single-threaded program.

An Example

The article provides a simple program where we can see this reordering:

int A, B;  // Note: static storage duration so initialized to zero

void foo()
{
    A = B + 1;
    B = 0;
}

and shows at higher optimization levels B = 0 is done before A = B + 1, and we can reproduce this result using godbolt, which while using -O3 produces the following (see it live):

movl    $0, B(%rip) #, B
addl    $1, %eax    #, D.1624

Why?

Why does the compiler reorder? The article explains it is exactly the same reason the processor does so, because of complexity of the architecture:

As I mentioned at the start, the compiler modifies the order of memory interactions for the same reason that the processor does it – performance optimization. Such optimizations are a direct consequence of modern CPU complexity.

Standards

In the draft C++ standard this is covered in section 1.9 Program execution which says (emphasis mine going forward):

The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.5

footnote 5 tells us this is also known as the as-if rule:

This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program. For instance, an actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no side effects affecting the observable behavior of the program are produced.

the draft C99 and draft C11 standard covers this in section 5.1.2.3 Program execution although we have to go to the index to see that it is called the as-if rule in the C standard as well:

as−if rule, 5.1.2.3

Update on Lock-Free considerations

The article An Introduction to Lock-Free Programming covers this topic well and for the OPs concerns on lock-less shared-hash table implementation this section is probably the most relevant:

Memory Ordering

As the flowchart suggests, any time you do lock-free programming for multicore (or any symmetric multiprocessor), and your environment does not guarantee sequential consistency, you must consider how to prevent memory reordering.

On today’s architectures, the tools to enforce correct memory ordering generally fall into three categories, which prevent both compiler reordering and processor reordering:

  • A lightweight sync or fence instruction, which I’ll talk about in future posts;
  • A full memory fence instruction, which I’ve demonstrated previously;
  • Memory operations which provide acquire or release semantics.

Acquire semantics prevent memory reordering of operations which follow it in program order, and release semantics prevent memory reordering of operations preceding it. These semantics are particularly suitable in cases when there’s a producer/consumer relationship, where one thread publishes some information and the other reads it. I’ll also talk about this more in a future post.