Double alignment
Following the discussion from this post, I have understood that the main reason for the alignment of structure members is performance (and some architectures restrictions).
If we will investigate Microsoft (Visual C++), Borland/CodeGear (C++-Builder), Digital Mars (DMC) and GNU (GCC) when compiling for 32-bit x86: The alignment for int
is 4 bytes and if int
is not aligned, it can happen that 2 rows of memory banks will be read.
My question is why not to make double
to be 4 bytes aligned also? 4 bytes aligned double
also will cause 2 rows of memory banks reading....
For example in the following example, since double
is 8-aligned, the actual size of structure will be sizeof(char) + (alignment for double padding) + sizeof(int) = 20 bytes
.
typedef struct structc_tag{
char c;
double d;
int s;
} structc_t;
Thank you
Solution 1:
An extended comment:
According to GCC documentation about -malign-double
:
Aligning
double
variables on a two-word boundary produces code that runs somewhat faster on a Pentium at the expense of more memory.On x86-64,
-malign-double
is enabled by default.Warning: if you use the
-malign-double
switch, structures containing the above types are aligned differently than the published application binary interface specifications for the 386 and are not binary compatible with structures in code compiled without that switch.
A word here means i386 word which is 32 bits.
Windows uses 64-bit alignment of double
values even in 32-bit mode, while SysV i386 ABI conformant Unices use 32-bit alignment. The 32-bit Windows API, Win32, comes from Windows NT 3.1, which, unlike current generation Windows versions, targeted Intel i386, Alpha, MIPS and even the obscure Intel i860. As native RISC systems like Alpha and MIPS require double
values to be 64-bit aligned (otherwise hardware fault occurs), portability might have been the rationale behind the 64-bit alignment in the Win32 i386 ABI.
64-bit x86 systems, know also as AMD64 or x86-64, or x64, require double
values to be 64-bit aligned otherwise a misalignment fault occurs and the hardware does an expensive "fix-up" which considreably slows down memory access. That's why double
values are 64-bit aligned in all modern x86-64 ABIs (SysV and Win32).
Solution 2:
Most compilers will automatically align data values to the word size of the platform, or to the size of the data type, whichever is smaller. The vast majority of consumer and enterprise processors use a 32 bit word size. (Even 64 bit systems usually use 32 bits as a native word size)
As such, the ordering of members in your struct could possibly waste some memory. In your specific case, you're fine. I'll add in comments the actual footprint of used memory:
typedef struct structc_tag{
char c; // 1 byte
// 3 bytes (padding)
double d; // 8 bytes
int s; // 4 bytes
} structc_t; // total: 16 bytes
This rule applies to structures too, so even if you rearranged them so the smallest field was last, you would still have a struct of the same size (16 bytes).
typedef struct structc_tag{
double d; // 8 bytes
int s; // 4 bytes
char c; // 1 byte
// 3 bytes (padding)
} structc_t; // total: 16 bytes
If you were to declare more fields that were smaller than 4 bytes, you could see some memory reductions if you grouped them together by size. For example:
typedef struct structc_tag{
double d1; // 8 bytes
double d2; // 8 bytes
double d3; // 8 bytes
int s1; // 4 bytes
int s2; // 4 bytes
int s3; // 4 bytes
short s4; // 2 bytes
short s5; // 2 bytes
short s6; // 2 bytes
char c1; // 1 byte
char c2; // 1 byte
char c3; // 1 byte
// 3 bytes (padding)
} structc_t; // total: 48 bytes
Declaring a stupid struct could waste a lot of memory, unless the compiler reorders your elements (which, in general, it won't, without being explicitly told to)
typedef struct structc_tag{
int s1; // 4 bytes
char c1; // 1 byte
// 3 bytes (padding)
int s2; // 4 bytes
char c2; // 1 byte
// 3 bytes (padding)
int s3; // 4 bytes
char c3; // 1 byte
// 3 bytes (padding)
} structc_t; // total: 24 bytes
// (9 bytes wasted, or 38%)
// (optimal size: 16 bytes (1 byte wasted))
Doubles are larger than 32 bits, and thus according to the rule in the first section, are 32 bit aligned. Someone mentioned a compiler option that changes the alignment, and that the default compiler option is different between 32 and 64 bit systems, this is also valid. So the real answer about doubles is that it depends on the platform and the compiler.
Memory performance is governed by words: loading from memory happens in stages that depend on the placement of data. If the data covers one word (i.e. is word aligned), only that word need be loaded. If it is not aligned correctly (i.e. an int at 0x2), the processor must load 2 words in order to correctly read its value. The same applies to doubles, which normally take up 2 words, but if misaligned, take up 3. On 64 bit systems where native loading of 64 bit quantities is possible, they behave like 32 bit ints on 32 bit systems in that if properly aligned, they can be fetched with one load, but otherwise, they will require 2.
Solution 3:
First of all it's the architecture that impose the alignment requirement, some will tolerate the unaligned memory accesses, others wont.
Lets take x86-32bit
windows platform as an example, in this platform the alignment requirement for int
and long
is 4 bytes
and 8 bytes
respectively.
It is clear why int
alignment requirement is 4 bytes
, simply so that the cpu can read it all only by one access.
The reason why the alignment requirement for doulbe
is 8 bytes
and not 4 bytes
, is because if it was 4 bytes
then think about what will happen if this double was located at the address 60
and the cache line size was 64bits
, in this case the processor need to load 2 cache lines from memory to cache, but if this double
was aligned this won't happen, since in this case the double
will always be part of one cache line and not divided between two.
...58 59|60 61 62 63 64 65 66 67|68 69 70 71...
- - - - - - - - - - - - - - - - -
----------+ + + + . . + + + +--------------
| . . |
----------+ + + + . . + + + +--------------
. .
Cache Line 1 . . Cache Line 2
- - - - - - - - - - - - - - - - -