How does an atomic operation guarantee consistency from a hardware perspective?

  1. As far as I know, an atomic instruction makes sure that when it is executed, no other threads can modify that data (just like a critical section). Am I correct?
  2. How is this implemented in hardware?
  3. How does hardware guarantee this? (Does the hardware generate three micro instructions internally: unlock, modify, and lock?)
  4. What is the difference between just using a mutex vs. an atomic instruction? Is the only difference the number of instructions (1 instruction for atomic, multiple instructions for a normal mutex)?
  5. Is that number of instructions' difference (1 vs. many) guarantee correctness (like using a mutex) and consistency?

The details are complex; in a single processor it is simple enough to implement some equivalent of "lock, modify, unlock" at the microcode level - or other techniques.

Once you have multiple processors the subject gets complex, especially in view of cache effects. Protocols like MSI, and derivatives MESI, MOSI, MOESI, back this in modern Intel processors.

WikiPedia has a good summary in Cache Coherence as well.

As to a mutex vs an atomic instruction: a mutex is, more or less, an agreement that one bit of memory will be used to atomically allow one-and-only-one person to set it to a specific state. That means it can use atomic operations to protect non-atomic operations - a protocol that is agreed by both sides, to pretend you can be atomic at a larger scale than you really can.


On the majority of modern CPUs, an atomic operation works by locking the affected memory address in the CPU's cache. The CPU acquires the memory address exclusively in its cache and then does not permit any other CPU to acquire or share that address until the operation completes.