Aligning to cache line and knowing the cache line size
I am using Linux and 8-core x86 platform. First how do I find the cache line size.
$ getconf LEVEL1_DCACHE_LINESIZE
64
Pass the value as a macro definition to the compiler.
$ gcc -DLEVEL1_DCACHE_LINESIZE=`getconf LEVEL1_DCACHE_LINESIZE` ...
At run-time sysconf(_SC_LEVEL1_DCACHE_LINESIZE)
can be used to get L1 cache size.
To know the sizes, you need to look it up using the documentation for the processor, afaik there is no programatic way to do it. On the plus side however, most cache lines are of a standard size, based on intels standards. On x86 cache lines are 64 bytes, however, to prevent false sharing, you need to follow the guidelines of the processor you are targeting (intel has some special notes on its netburst based processors), generally you need to align to 64 bytes for this (intel states that you should also avoid crossing 16 byte boundries).
To do this in C or C++ requires that you use the standard aligned_alloc
function or one of the compiler specific specifiers such as __attribute__((align(64)))
or __declspec(align(64))
. To pad between members in a struct to split them onto different cache lines, you need on insert a member big enough to align it to the next 64 byte boundery
Another simple way is to just cat the /proc/cpuinfo:
grep cache_alignment /proc/cpuinfo
There's no completely portable way to get the cacheline size. But if you're on x86/64, you can call the cpuid
instruction to get everything you need to know about the cache - including size, cacheline size, how many levels, etc...
http://softpixel.com/~cwright/programming/simd/cpuid.php
(scroll down a little bit, the page is about SIMD, but it has a section getting the cacheline.)
As for aligning your data structures, there's also no completely portable way to do it. GCC and VS10 have different ways to specify alignment of a struct. One way to "hack" it is to pad your struct with unused variables until it matches the alignment you want.
To align your mallocs(), all the mainstream compilers also have aligned malloc functions for that purpose.