How to load vector registers from integer registers in Arm64? (M1)

This is a question about SIMD instructions on AArch64 on an M1.

I am working on a routine that works entirely inside the registers. All the memory reads and writes occur outside of the main loop. The first routine loads pseudo-random bits into registers x14-x22 (excluding x18).

Other than writing those values to memory, I cannot seem to figure out how to load that series of bits to the v5-v8 vector registers without writing them to memory first. I do not want to do that. Asking me why won't be particularly helpful.

I'm sure there is a simple way to do this, but I cannot find it in any of my resources.

                fmov    d5, x14
                rev64 v5.2d, v5.2d. <--- error!
                ror   q5, q5, #8 <----error!
                fmov   d6, x16
                
                fmov   d6, x17
                fmov   d7, x19
                fmov   d7, x20
                fmov   d8, x21
                fmov   d8, x22

In the above code, I'm able to load the lower 64 bits with what I want, but I cannot seem to figure out how to rotate the bits over.

In 32-bit arm you can stack these directly.


Already answered in comments by Peter Cordes, just promoting to an answer:

You want the ins instruction. It moves a general-purpose register into a specified element of a vector register, leaving other element unchanged.

fmov d6, x16     // move x16 into d6, which is the low half of v6; high half is zeroed
ins v6.d[1], x17 // insert x17 into high half of v6; leave low half unchanged

You can also write mov v6.d[1], x17 which is an assembler alias for the same thing. (The instruction will disassemble as mov.)

You might think that it would be more natural to write

ins v6.d[0], x16
ins v6.d[1], x17

but then you would have a false input dependency on the previous value of v6. The fmov, since it zeroes the rest of the vector register, ensures that the previous value of v6 is irrelevant, and out-of-order execution need not wait for it to be ready.

For future reference, instructions for moving elements to / from / between / within vector registers are listed in the Armv8 Architecture Reference Manual section C3.5.13 (in my version), "SIMD move".