New posts in assembly

Displaying numbers with DOS

Test whether a register is zero with CMP reg,0 vs OR reg,reg?

Why does mulss take only 3 cycles on Haswell, different from Agner's instruction tables? (Unrolling FP loops with multiple accumulators)

Can x86's MOV really be "free"? Why can't I reproduce this at all?

How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent

How do I achieve the theoretical maximum of 4 FLOPs per cycle?

Boot loader doesn't jump to kernel code

Why should EDX be 0 before using the DIV instruction?

Why doesn't GCC use partial registers?

Assembling 32-bit binaries on a 64-bit system (GNU toolchain)

Why are loops always compiled into "do...while" style (tail jump)?

Can num++ be atomic for 'int num'?

Why does GCC use multiplication by a strange number in implementing integer division?

Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly?

What's the purpose of the LEA instruction?

Referencing the contents of a memory location. (x86 addressing modes)

Fastest way to do horizontal SSE vector sum (or other reduction)

How do I print an integer in Assembly Level Programming without printf from the c library?

Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?

Micro fusion and addressing modes