Understanding `memory_order_acquire` and `memory_order_release` in C++11
I'm reading through the documentation and more specifically
memory_order_acquire: A load operation with this memory order performs the acquire operation on the affected memory location: no reads or writes in the current thread can be reordered before this load. All writes in other threads that release the same atomic variable are visible in the current thread (see Release-Acquire ordering below).
memory_order_release: A store operation with this memory order performs the release operation: no reads or writes in the current thread can be reordered after this store. All writes in the current thread are visible in other threads that acquire the same atomic variable (see Release-Acquire ordering below) and writes that carry a dependency into the atomic variable become visible in other threads that consume the same atomic (see Release-Consume ordering below)
These two bits:
from memory_order_acquire
... no reads or writes in the current thread can be re-ordered before this load...
from memory_order_release
... no reads or writes in the current thread can be re-ordererd after this store...
What exactly do they mean?
There's also this example
#include <thread>
#include <atomic>
#include <cassert>
#include <string>
std::atomic<std::string*> ptr;
int data;
void producer()
{
std::string* p = new std::string("Hello");
data = 42;
ptr.store(p, std::memory_order_release);
}
void consumer()
{
std::string* p2;
while (!(p2 = ptr.load(std::memory_order_acquire)))
;
assert(*p2 == "Hello"); // never fires
assert(data == 42); // never fires
}
int main()
{
std::thread t1(producer);
std::thread t2(consumer);
t1.join(); t2.join();
}
But I cannot really figure where the two bits I've quoted apply. I understand what's happening but I don't really see the re-ordering bit because the code is small.
The work done by a thread is not guaranteed to be visible to other threads.
To make data visible between threads, a synchronization mechanism is needed. A non-relaxed atomic
or a mutex
can be used for that. It's called the acquire-release semantics. Writing a mutex "releases" all memory writes before it and reading the same mutex "acquires" those writes.
Here we use ptr
to "release" work done so far (data = 42
) to another thread:
data = 42;
ptr.store(p, std::memory_order_release); // changes ptr from null to not-null
And here we wait for that, and by doing that we synchronize ("acquire") the work done by the producer thread:
while (!ptr.load(std::memory_order_acquire)) // assuming initially ptr is null
;
assert(data == 42);
Note two distinct actions:
- we wait between threads (the synchronization step)
- as a side effect of the wait, we get to transfer work from the provider to the consumer (the provider releases and the consumer acquires it)
In the absence of (2), e.g. when using memory_order_relaxed
, only the atomic
value itself is synchronized. All other work done before/after isn't, e.g. data
won't necessarily contain 42
and there may not be a fully constructed string
instance at the address p
(as seen by the consumer).
For more details about acquire/release semantics and other details of the C++ memory model I would recommend watching Herb's excellent atomic<> weapons talk, it's long but is fun to watch. And for even more details there's a book called "C++ Concurrency in Action".
Acquire and Release are Memory Barriers.
If your program reads data after an acquire barrier you are assured you will be reading data consistent in order with any preceding release by any other thread in respect of the same atomic variable. Atomic variables are guaranteed to have an absolute order (when using memory_order_acquire
and memory_order_release
though weaker operations are provided for) to their reads and writes across all threads. These barriers in effect propagate that order to any threads using that atomic variable.
You can use atomics to indicate something has 'finished' or is 'ready' but if the consumer reads beyond that atomic variable the consumer can't be rely on 'seeing' the right 'versions' of other memory and atomics would have limited value.
The statements about 'moving before' or 'moving after' are instructions to the optimizer that it shouldn't re-order operations to take place out of order. Optimizers are very good at re-ordering instructions and even omitting redundant reads/writes but if they re-organise the code across the memory barriers they may unwittingly violate that order.
Your code relies on the std::string
object (a) having been constructed in producer()
before ptr
is assigned and (b) the constructed version of that string (i.e. the version of the memory it occupies) being the one that consumer()
reads.
Put simply consumer()
is going to eagerly read the string as soon as it sees ptr
assigned so it damn well better see a valid and fully constructed object or bad times will ensue.
In that code 'the act' of assigning ptr
is how producer()
'tells' consumer
the string is 'ready'. The memory barrier exists to make sure that's what the consumer sees.
Conversely if ptr
was declared as an ordinary std::string *
then the compiler could decide to optimize p
away and assign the allocated address directly to ptr
and only then construct the object and assign the int
data. That is likely a disaster for the consumer
thread which is using that assignment as the indicator that the objects producer
is preparing are ready.
To be accurate if ptr
were a pointer the consumer
may never see the value assigned or on some architectures read a partially assigned value where only some of the bytes have been assigned and it points to a garbage memory location. However those aspects are about it being atomic not the wider memory barriers.