New posts in assembly

Displaying numbers with DOS

assembly dos x86-16 integer-division signed-integer

Test whether a register is zero with CMP reg,0 vs OR reg,reg?

assembly optimization x86 micro-optimization

Why does mulss take only 3 cycles on Haswell, different from Agner's instruction tables? (Unrolling FP loops with multiple accumulators)

c assembly x86 sse micro-optimization

Can x86's MOV really be "free"? Why can't I reproduce this at all?

c assembly x86 cpu-architecture micro-optimization

How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent

assembly x86 intel cpu-architecture micro-optimization

How do I achieve the theoretical maximum of 4 FLOPs per cycle?

c++ assembly x86-64 cpu-architecture flops

Boot loader doesn't jump to kernel code

assembly virtualbox nasm x86-16 bootloader

Why should EDX be 0 before using the DIV instruction?

assembly x86 integer-division

Why doesn't GCC use partial registers?

assembly gcc x86 x86-64 cpu-architecture

Assembling 32-bit binaries on a 64-bit system (GNU toolchain)

linux assembly build x86 att

Why are loops always compiled into "do...while" style (tail jump)?

performance loops assembly optimization micro-optimization

Can num++ be atomic for 'int num'?

c++ c multithreading assembly atomic

Why does GCC use multiplication by a strange number in implementing integer division?

c gcc assembly x86-64 integer-division

Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly?

c++ performance assembly optimization x86

What's the purpose of the LEA instruction?

assembly x86 x86-64 x86-16

Referencing the contents of a memory location. (x86 addressing modes)

assembly x86 masm addressing-mode

Fastest way to do horizontal SSE vector sum (or other reduction)

assembly optimization floating-point sse simd

How do I print an integer in Assembly Level Programming without printf from the c library?

assembly x86 integer output nasm

Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?

c assembly x86-64 compiler-optimization llvm-codegen

Micro fusion and addressing modes

assembly x86 intel cpu-architecture iaca