What x86 instructions take two (or more) memory operands?

I thought that there was zero. But, I see here,

Instructions with two memory operands are extremely rare

I can't find anything that explains what instructions, though rare, exist. What are the exceptions?


Solution 1:

I can't find anything that explains the rarity.

An x86 instruction can have at most one ModR/M + SIB + disp0/8/32. So there are zero instructions with two explicit memory operands.

The x86 memory-memory instructions all have at least one implicit memory operand whose location is baked in to the opcode, like push which accesses the stack, or the string instructions movs and cmps.

What are the exceptions?

I'll use [mem] to indicate a ModR/M addressing mode which can be [rdi], [RIP+whatever], [ebx+eax*4+1234], or whatever you like.

  • push [mem]: reads [mem], writes implicit [rsp] (after updating rsp).
  • pop [mem]
  • call [mem]: reads a new RIP from [mem], pushes a return address on the stack.
  • movsb/w/d/q: reads DS:(E)SI, writes ES:(E)DI (or in 64-bit mode RSI and RDI). Both are implicit; only the DS segment reg is overridable. Usable with rep.
  • cmpsb/w/d/q: reads DS:(E)SI and ES:(E)DI (or in 64-bit mode RSI and RDI). Both are implicit; only the DS segment reg is overridable. Usable with repe / repne.

  • MPX bndstx mib, bnd: "Store the bounds in bnd and the pointer value in the index register of mib to a bound table entry (BTE) with address translation using the base of mib." The Operation section shows a load and a store, but I don't know enough about MPX to grok it.

  • movdir64b r16/r32/r64, m512. Has its own feature bit, available in upcoming Tremont (successor to Goldmont Plus Atom). Moves 64-bytes as direct-store (WC) with 64-byte write atomicity from source memory address to destination memory address. Destination operand is (aligned atomic) es: /r from ModRM, source is (unaligned non-atomic) the /m from ModRM.

    Uses write-combining for the store, see the description. It's the first time any x86 CPU vendor has guaranteed atomicity wider than 8 bytes outside of lock cmpxchg16b. But unfortunately it's not actually great for multithreading because it forces NT-like cache eviction/bypass behaviour, so other cores will have to read it from DRAM instead of a shared outer cache.

AVX2 gather and AVX512 scatter instructions are debatable. They obviously do multiple loads / stores, but all the pointers come from one SIMD vector (and a scalar base).

I'm not counting instructions like pusha, fldenv, xsaveopt, iret, or enter with nesting level > 1 that do multiple stores or loads to a contiguous block.

I'm also not counting the ins / outs string instructions, because they copy memory to/from I/O space. I/O space isn't memory.

I didn't look at VMX or SGX instructions on http://felixcloutier.com/x86/index.html, just the main list. I don't think I missed any, but I certainly could have.