C++11 Implementation of Spinlock using <atomic>
Solution 1:
The problem is that compare_exchange_weak
updates the unlatched
variable once it fails. From the documentation of compare_exchange_weak
:
Compares the contents of the atomic object's contained value with expected: - if true, it replaces the contained value with val (like store). - if false, it replaces expected with the contained value .
I.e., after the first failing compare_exchange_weak
, unlatched
will be updated to true
, so the next loop iteration will try to compare_exchange_weak
true
with true
. This succeeds and you just took a lock that was held by another thread.
Solution:
Make sure to set unlatched
back to false
before each compare_exchange_weak
, e.g.:
while(!latch.compare_exchange_weak(unlatched, true, std::memory_order_acquire)) {
unlatched = false;
}
Solution 2:
As mentioned by @gexicide, the problem is that the compare_exchange
functions update the expected
variable with the current value of the atomic variable. That is also the reason, why you have to use the local variable unlatched
in the first place. To solve this you can set unlatched
back to false in each loop iteration.
However, instead of using compare_exchange
for something its interface is rather ill suited for, it is much simpler to use std::atomic_flag
instead:
class SpinLock {
std::atomic_flag locked = ATOMIC_FLAG_INIT ;
public:
void lock() {
while (locked.test_and_set(std::memory_order_acquire)) { ; }
}
void unlock() {
locked.clear(std::memory_order_release);
}
};
Source: cppreference
Manually specifying the memory order is just a minor potential performance tweak, which I copied from the source. If simplicity is more important than the last bit of performance, you can stick to the default values and just call locked.test_and_set() / locked.clear()
.
Btw.: std::atomic_flag
is the only type that is guaranteed to be lock free, although I don't know any platform, where oparations on std::atomic_bool
are not lock free.
Update: As explained in the comments by @David Schwartz, @Anton and @Technik Empire, the empty loop has some undesirable effects like branch missprediction, thread starvation on HT processors and overly high power consumption - so in short, it is a pretty inefficient way to wait. The impact and solution are architecture, platform and application specific. I'm no expert, but the usual solution seems to be to add either a cpu_relax()
on linux or YieldProcessor()
on windows to the loop body.
EDIT2: Just to be clear: The portable version presented here (without the special cpu_relax etc instructions) should alredy be good enough for many applications. If your SpinLock
spins a lot because someone other is holding the lock for a long time (which might already indicate a general design problem), it is probably better to use a normal mutex anyway.