Moving 64bit constants to memory
Looks like you didn't check for asmjit errors. The docs say there's a
kErrorInvalidImmediate
- Invalid immediate (out of bounds on X86 and invalid pattern on ARM).
The only x86-64 instruction that can use a 64-bit immediate is mov
-immediate to register, the special no-modrm opcode that gives us 5-byte mov eax, 12345
, or 10-byte mov rax, 0x0123456789abcdef
, where a REX.W prefix changes that opcode to look for a 64-bit immediate. See https://www.felixcloutier.com/x86/mov / why we can't move a 64-bit immediate value to memory?
Your title is a red herring. It's nothing to do with having an m64
operand for and
, it's the constant that's the problem. You can verify that by single-stepping the asm with a debugger and checking both operands before the and
, including the one in memory. (It's probably -1
from 0xFFFFFFFF
as an immediate for mov m64, sign_extended_imm32
, which would explain AND not changing the value in R14).
Also disassembly of the JITed machine code should show you what immediate is actually encoded; again a debugger could provide that as you single-step through it.
Use your temporary register for the constant (like mov r14, 0xFFFFFFFFFFFF
), then and reg,mem
to load-and-mask.
Or better, if the target machine you're JITint for has BMI1 andn
, construct the inverted constant once outside a loop with mov r13, ~0xFFFFFFFFFFFF
then inside the loop use andn r14, r13, [r15+32]
which does a load+and without destroying the mask, all with one instructions which can decode to a single uop on Intel/AMD CPUs.
Of if you can't reuse a constant register over a loop, maybe mov reg,imm64
, then push reg
or mov mem,reg
and use that in future AND instructions. Or emit some constant data somewhere near enough to reference with a RIP-relative addressing mode, although that takes a bit more code-size at every and
instruction. (ModRM + 4 byte rel32, vs. ModRM + SIB + 0 or 1 bytes for data on the stack close to RSP).
BTW, if you're just truncating instead of sign-extending, you're also assuming this is address is in the low half of virtual address space (i.e. user-space). That's fine, though. Fun fact: future x86 CPUs (first Sapphire Rapids) will have an optional feature that OSes can enable to transparently ignore the high bits, except for the MSB: LAM = Linear Address Masking. See Intel's future-extensions manual.
So if this feature is enabled with 48-bit masking for user-space, you can skip the AND masking entirely. (If your code makes sure bit 47 matches bit 63; you might want to keep the top bit unmodified or 0 so your code can take advantage of LAM when available to save instructions).
If you were masking to keep the low 32, you could just mov r14d, [r15+32]
to zero-extend the low dword of the value into 64-bit R14. But for keeping the high 48 or 57 bits, you need a mask or BMI2 bzhi
with 48
in a register.