New posts in micro-optimization

How much faster are SSE4.2 string instructions than SSE2 for memcmp?

How to force NASM to encode [1 + rax*2] as disp32 + index*2 instead of disp8 + base + index?

Do java finals help the compiler create more efficient bytecode? [duplicate]

' ... != null' or 'null != ....' best performance?

"enter" vs "push ebp; mov ebp, esp; sub esp, imm" and "leave" vs "mov esp, ebp; pop ebp"

Cycles/cost for L1 Cache hit vs. Register on x86?

Is it useful to use VZEROUPPER if your program+libraries contain no SSE instructions?

Is vxorps-zeroing on AMD Jaguar/Bulldozer/Zen faster with xmm registers than ymm?

Does using xor reg, reg give advantage over mov reg, 0? [duplicate]

Using Intrinsics to Extract And Shift Odd/Even Bits

Weird use of `?:` in `typeid` code

How to force GCC to assume that a floating-point expression is non-negative?

Micro Optimization of a 4-bucket histogram of a large array or list

Which of these pieces of code is faster in Java?

Fast method to copy memory with translation - ARGB to BGR

Weird performance effects from nearby dependent stores in a pointer-chasing loop on IvyBridge. Adding an extra load speeds it up?

Is it more efficient to perform a range check by casting to uint instead of checking for negative values?

Fastest way to strip all non-printable characters from a Java String

Why can't GCC generate an optimal operator== for a struct of two int32s?

Why does my application spend 24% of its life doing a null check?