Moving 64bit constants to memory

Looks like you didn't check for asmjit errors. The docs say there's a kErrorInvalidImmediate - Invalid immediate (out of bounds on X86 and invalid pattern on ARM).

The only x86-64 instruction that can use a 64-bit immediate is mov-immediate to register, the special no-modrm opcode that gives us 5-byte mov eax, 12345, or 10-byte mov rax, 0x0123456789abcdef, where a REX.W prefix changes that opcode to look for a 64-bit immediate. See https://www.felixcloutier.com/x86/mov / why we can't move a 64-bit immediate value to memory?


Your title is a red herring. It's nothing to do with having an m64 operand for and, it's the constant that's the problem. You can verify that by single-stepping the asm with a debugger and checking both operands before the and, including the one in memory. (It's probably -1 from 0xFFFFFFFF as an immediate for mov m64, sign_extended_imm32, which would explain AND not changing the value in R14).

Also disassembly of the JITed machine code should show you what immediate is actually encoded; again a debugger could provide that as you single-step through it.


Use your temporary register for the constant (like mov r14, 0xFFFFFFFFFFFF), then and reg,mem to load-and-mask.

Or better, if the target machine you're JITint for has BMI1 andn, construct the inverted constant once outside a loop with mov r13, ~0xFFFFFFFFFFFF then inside the loop use andn r14, r13, [r15+32] which does a load+and without destroying the mask, all with one instructions which can decode to a single uop on Intel/AMD CPUs.

Of if you can't reuse a constant register over a loop, maybe mov reg,imm64, then push reg or mov mem,reg and use that in future AND instructions. Or emit some constant data somewhere near enough to reference with a RIP-relative addressing mode, although that takes a bit more code-size at every and instruction. (ModRM + 4 byte rel32, vs. ModRM + SIB + 0 or 1 bytes for data on the stack close to RSP).


BTW, if you're just truncating instead of sign-extending, you're also assuming this is address is in the low half of virtual address space (i.e. user-space). That's fine, though. Fun fact: future x86 CPUs (first Sapphire Rapids) will have an optional feature that OSes can enable to transparently ignore the high bits, except for the MSB: LAM = Linear Address Masking. See Intel's future-extensions manual.

So if this feature is enabled with 48-bit masking for user-space, you can skip the AND masking entirely. (If your code makes sure bit 47 matches bit 63; you might want to keep the top bit unmodified or 0 so your code can take advantage of LAM when available to save instructions).


If you were masking to keep the low 32, you could just mov r14d, [r15+32] to zero-extend the low dword of the value into 64-bit R14. But for keeping the high 48 or 57 bits, you need a mask or BMI2 bzhi with 48 in a register.