A couple of questions about [base + index*scale + disp]
The general form for memory addressing in Intel and AT&T Syntax is the following:
[base + index*scale + disp]
disp(base, index, scale)
My questions are the following:
- Can
base
andindex
be any register? - What values can
scale
take, is it 1, 2, 4 and 8 (with 1 being the default)? - Are
index
anddisp
interchangeable (with the only difference being thatindex
is a register whiledisp
is an immediate value)?
Solution 1:
This is described in Intel's manual:
3.7.5 Specifying an Offset
The offset part of a memory address can be specified directly as a static value (called a displacement) or through an address computation made up of one or more of the following components:
- Displacement — An 8-, 16-, or 32-bit value.
- Base — The value in a general-purpose register.
- Index — The value in a general-purpose register. [can't be ESP/RSP]
- Scale factor — A value of 2, 4, or 8 that is multiplied by the index value.
The offset which results from adding these components is called an effective address.
The scale-factor is encoded as a 2-bit shift count (0,1,2,3), for scale factors of 1, 2, 4, or 8. And yes, *1
(shift count = 0) is the default if you write (%edi, %edx)
; that's equivalent to (%edi, %edx, 1)
In AT&T syntax, it's disp(base, index, scale)
- constants go outside the parens. Some Intel-syntax assemblers also allow syntax like 1234[ebx]
, others don't. But AT&T syntax is rigid; every component of the addressing mode can only go in its proper place. For example:
movzwl foo-0x10(,%edx,2), %eax
does a zero-extending 16-bit ("word") load into EAX, from the address foo-0x10 + edx*2
. EDX is the index register, with scale-factor 2. There is no base register. foo
and -0x10
are both part of the displacement, both link-time constants. foo
is a symbol address that the linker will fill in and subtract 0x10 from (because of the -0x10
assemble-time offset).
If you have the choice, use just a base instead of an index with a scale of 1. An index requires a SIB byte to encode, making the instruction longer. That's why compilers choose addressing modes like 8(%ebp)
to access stack memory, not 8(,%ebp)
.
See also Referencing the contents of a memory location. (x86 addressing modes) for more about when you might use a base, and/or index, and/or displacement.
A 16-bit displacement is only encodeable in a 16-bit addressing mode, which uses a different format that can't include a scale factor, and has a very limited selection of which registers can be a base or index.
So a mode like 1234(%edx)
would have to encode the 1234 as a 32-bit disp32
in 32-bit machine code.
Byte offsets from -128 .. +127 can use a short-form 8-bit encoding. Your assembler will take care of this for you, using the shortest valid encoding for the displacement.
All of this is identical in 64-bit mode for 64-bit addressing modes, with disp32 also being sign-extended to 64-bit just like disp8.