Why can't I move directly a byte to a 64 bit register?

Why can't I directly move a byte from memory to a 64-bit register in Intel x86-64 assembly?

For instance, this code:

extern printf

global main

segment .text

main:
    enter   2, 0

    mov     byte [rbp - 1], 'A'
    mov     byte [rbp - 2], 'B'

    mov     r12, [rbp - 1]
    mov     r13, [rbp - 2]             

    xor     rax, rax           
    mov     rdi, Format                                                                                             
    mov     rsi, r12                                                                                                
    mov     rdx, r13                                                                                                
    call    printf                                                                                                  

    leave                                                                                                           
    ret                                                                                                             

segment .data                                                                                                       
Format:     db "%d %d", 10, 0

prints:

65 16706

I need to change the move byte to registers r12 and r13 to this in order to make the code work properly:

xor     rax, rax
mov     al, byte [rbp - 1]
mov     r12, rax
xor     rax, rax
mov     al, byte [rbp - 2]
mov     r13, rax

Now, it prints what is intended:

65 66

Why do we need to do this?

Is there a simpler way of doing this?

Thanks.

Solution 1:

Use move with zero or sign extension as appropriate.

For example: movzx eax, byte [rbp - 1] to zero-extend into RAX.

movsx rax, byte [rbp - 1] to sign-extend into RAX.

Solution 2:

Expanding 8-bit registers to 64-bit when assigning values

You can use the movzx instruction to move a byte to the 64-bit register.

In your case, it would be

movzx     r12, byte ptr [rbp - 1]
movzx     r13, byte ptr [rbp - 2]

Another way to avoid addressing memory to time would have been

mov       ax,  word ptr [rbp - 2]
movzx     r12, al
movzx     r13, ah

but the last instruction would not be compiled. See http://www.felixcloutier.com/x86/MOVZX.html "In 64-bit mode, r/m8 can not be encoded to access the following byte registers if the REX prefix is used: AH, BH, CH, DH."

So we have to make the following:

mov       ax,  word ptr [rbp - 2]
movzx     r12, al
mov       al, ah
movzx     r13, al

But just two movxz'es like in the first example may be faster (the processor may optimize memory access) - the speed depends on a larger context and should be tested in complex.

You can take benefit of the fact that in 64-bit mode, modifying 32-bit registers also clears highest bits (63-32), but, anyway, you cannot encode the ah register with movzx instruction under 64-bit even to a 32-bit part of a new register appeared in 64-bit mode (movzx r13d, ah would not work).

Using 8-bit, 16-bit, and 32 parts of 64-bit rNN registers

You can use 8-bit, 16-bit, and 32 parts of 64-bit rNN registers the following way:

rNNb - byte rNNw - word rNNd - dword

for example, r10b, r10w, r10d. Here are the examples within the code

    xor     r8d,dword ptr [r9+r10*4]
    .....
    xor     r8b, al
    .....
    xor     eax, r11d

Please note: The 'h' parts of the rNN registers are not available, they are only available for four first registers: ah, bh, ch and dh.

Another note: when modifying 32-bit parts of 64-bit registers, higher 32 bits are automatically set to zero.

The fastest way of working with the registers

The fastest way of working with the registers is to always clear the highest bits, to remove false dependency on previous content of the registers. This is the way recommended by Intel, and will allow better Out-of-Order Execution (OOE) and Register Renaming (RR). Besides that, working with full registers rather with with their lower parts is faster on modern processors: Knights Landing and Cannonlake. So this is the code that will run faster on these processors (it will use OOE and RR):

movzx     rax, word ptr [rbp - 2]
movzx     r12, al
shr       rax, 8
mov       r13, rax

As about Knights Landing and future mainstream processors like CannonLake - Intel is explicit that instructions on 8-bit and 16-bit registers would be much slower than on 32-bit or 64-bit registers on CannonLake and so they are now on Knights Landing.

If you write with OOB and RR in mind, your assembly code will be much faster.