What's the relationship between CPU Out-of-order execution and memory order?

In my understanding, CPU changes the operations order which are written on machine code for optimization and it is called out-of-order execution.

In the term "memory order", it defines the order of accessing to the memory. For example, in relaxed order, it defines very weak ordering rules and execution reordering is easy to happen.
There are some memory ordering models like TSO in x86. In such memory ordering model, the semantics of memory access order by the processor is defined.

What I don't understand is the relationship of them. Is memory order a kind of out of order execution and are there any other ways for OoOe?
Or, is memory order the implementation of out of order execution and all the reorders by processors are based on the semantics?


Solution 1:

The general issue is that on a modern multiprocessor system, load and store instructions may become visible to other cores in a different order than program order. Out-of-order execution is one way in which this can happen, but there are others.

For instance, you could have a CPU which executes and retires all instructions in strict program order, but when it does a store instruction, instead of committing it to L1 cache immediately, it puts it in a store buffer to be written to cache later. The store buffer could be designed to write out stores in a different order than they came in; for instance, if a first store misses L1 cache but a second one would hit, you could save time by writing out the second one while waiting for the first one's cache line to load.

Or, even if the store buffer doesn't reorder, you could have a situation where, while a store is still waiting in the store buffer, the CPU executes a load instruction that came later in program order. Other cores will thus see the load happening before the store. This is the situation with x86, for instance.

The memory ordering model defines, in an abstract way, what the programmer is entitled to expect about the order in which loads and stores become visible to other cores (or hardware, etc). It also usually specifies how the programmer can gain stronger guarantees when needed (e.g. by executing barrier instructions). The CPU then has to be designed to provide the defined behavior, which may place constraints on the features it can include. For instance, if the architecture promises TSO, the CPU probably can't include a store buffer that's capable of reordering, unless they manage to do it in such a clever way that the reordering can never be noticed by other cores.

Related questions:

  • Are memory barriers needed because of cpu out of order execution or because of cache consistency problem?

  • Out of Order Execution and Memory Fences

  • How does memory reordering help processors and compilers?

  • How do modern Intel x86 CPUs implement the total order over stores