Why do we need to disambiguate when adding an immediate value to a value at a memory address

Explains that unless we specify a size operator (such as byte or dword) when adding an immediate value to a value stored at a memory address, NASM will return an error message.

section .data           ; Section containing initialized data

    memory_address: db "PIPPACHIP"

section .text           ; Section containing code

global  _start          ; Linker needs this to find the entry point!

_start:

23            mov ebx, memory_address
24            add [ebx], 32

........................................................

24:  error: operation size not specified. 

Fair’s fair.

I’m curious as to why this is so however. As the two following segments of code will yield the same result.

add byte [ebx], 32

or

add dword [ebx], 32

So what difference does it make? (Other than not making much sense as to why you would use dword in this instance). Is it simply because “NASM says so”? Or is there some logic here that I am missing?

If the assembler can decipher the operand size from a register name, for example add [ebx], eax would work, why not do the same for an immediate value, i.e. just go ahead and calculate the size of the immediate value upfront.

What is the requirement that means a size operator needs to be specified when adding an immediate value to a value at a memory address?

NASM version 2.11.08 Architecture x86


It does matter what operand-size you use for several reasons, and it would be weird and unintuitive / non-obvious to have the size implied by the integer value. It's a much better design to have NASM error when there's ambiguity because neither operand is a register.


As the two following segments of code will yield the same result:

add byte [ebx], 32
add dword [ebx], 32

They only yield the same result because 'P' + 32 doesn't carry into the next byte.

Flags are set according to the result. If the 4th byte had its high bit set, then SF would be set for the dword version.

re: comments about how CF works:

Carry-out from an add is always 0 or 1. i.e. the sum of two N-bit integers will always fit in an (N+1)-bit integer, where the extra bit is CF. Think of the add eax, ebx as producing the result in CF:EAX, where each bit can be 0 or 1 depending on the input operands.


Also, if ebx was pointing at the last byte in a page, then dword [ebx] could segfault (if the next page was unmapped), but byte [ebx] wouldn't.

This also has performance implications: read-modify-write of a byte can't store-forward to a dword load, and a dword read-modify-write accesses all 4 bytes. (And correctness if another thread had just modified one of those other bytes before this thread stored the old value over it.)


For these and various other reasons, it matters whether the opcode for the instruction that NASM assembles into the output file is the opcode for add r/m32, imm8 or add r/m8, imm8.

It's a Good Thing that it forces you to be explicit about which one you mean instead of having some kind of default. Basing it on the size of the immediate would be confusing, too, especially when using a ASCII_casebit equ 0x20 constant. You don't want the operand-size of your instructions to change when you change a constant.