x86 LOCK question on multi-core CPUs
Solution 1:
It's about locking the memory bus for that address. The Intel 64 and IA-32 Architectures Software Developer's Manual - Volume 3A: System Programming Guide, Part 1 tells us:
7.1.4 Effects of a LOCK Operation on Internal Processor Caches.
For the Intel486 and Pentium processors, the LOCK# signal is always asserted on the bus during a LOCK operation, even if the area of memory being locked is cached in the processor.
For the P6 and more recent processor families, if the area of memory being locked during a LOCK operation is cached in the processor that is performing the LOCK operation as write-back memory and is completely contained in a cache line, the processor may not assert the LOCK# signal on the bus. Instead, it will modify the memory location internally and allow [its] cache coherency mechanism to insure that the operation is carried out atomically. This operation is called "cache locking." The cache coherency mechanism automatically prevents two or more processors that have the same area of memory from simultaneously modifying data in that area. (emphasis added)
Here we learn that the P6 and newer chips are smart enough to determine if they really have to block off the bus or can just rely on intelligent caching. I think this is a neat optimization.
I discussed this more in my blog post "How Do Locks Lock?"
Solution 2:
No, but it may force other processors to wait while this one accesses memory. Whether these wait states ever make a difference depend on the extent to which the processors are running off a cache.