Why does this spinlock require memory_order_acquire_release instead of just acquire?

Solution 1:

std::memory_order_acq_rel is not required.

Mutex synchronization is between 2 threads.. one releasing the data and another acquiring it.
As such, it is irrelevant for other threads to perform a release or acquire operation.

Perhaps it is more intuitive (and efficient) if the acquire is handled by a standalone fence:

void lock(){
  while(flag.test_and_set(std::memory_order_relaxed) )
    ;
  std::atomic_thread_fence(std::memory_order_acquire);
}

void unlock(){
  flag.clear(std::memory_order_release);
}

Multiple threads can spin on flag.test_and_set, but one manages to read the updated value and set it again (in a single operation).. only that thread acquires the protected data after the while-loop.