Why bit endianness is an issue in bitfields?
Solution 1:
By the C standard, the compiler is free to store the bit field pretty much in any random way it wants. You can never make any assumptions of where the bits are allocated. Here are just a few bit-field related things that are not specified by the C standard:
Unspecified behavior
- The alignment of the addressable storage unit allocated to hold a bit-field (6.7.2.1).
Implementation-defined behavior
- Whether a bit-field can straddle a storage-unit boundary (6.7.2.1).
- The order of allocation of bit-fields within a unit (6.7.2.1).
Big/little endian is of course also implementation-defined. This means that your struct could be allocated in the following ways (assuming 16 bit ints):
PADDING : 8
f1 : 1
f2 : 3
f3 : 4
or
PADDING : 8
f3 : 4
f2 : 3
f1 : 1
or
f1 : 1
f2 : 3
f3 : 4
PADDING : 8
or
f3 : 4
f2 : 3
f1 : 1
PADDING : 8
Which one applies? Take a guess, or read in-depth backend documentation of your compiler. Add the complexity of 32-bit integers, in big- or little endian, to this. Then add the fact that the compiler is allowed to add any number of padding bytes anywhere inside your bit field, because it is treated as a struct (it can't add padding at the very beginning of the struct, but everywhere else).
And then I haven't even mentioned what happens if you use plain "int" as bit-field type = implementation-defined behavior, or if you use any other type than (unsigned) int = implementation-defined behavior.
So to answer the question, there is no such thing as portable bit-field code, because the C standard is extremely vague with how bit fields should be implemented. The only thing bit-fields can be trusted with is to be chunks of boolean values, where the programmer isn't concerned of the location of the bits in memory.
The only portable solution is to use the bit-wise operators instead of bit fields. The generated machine code will be exactly the same, but deterministic. Bit-wise operators are 100% portable on any C compiler for any system.
Solution 2:
As far as I understand, bitfields are purely compiler constructs
And that's part of the problem. If the use of bit-fields was restricted to what the compiler 'owned', then how the compiler packed bits or ordered them would be of pretty much no concern to anyone.
However, bit-fields are probably used far more often to model constructs that are external to the compiler's domain - hardware registers, the 'wire' protocol for communications, or file format layout. These thing have strict requirements of how bits have to be laid out, and using bit-fields to model them means that you have to rely on implementation-defined and - even worse - the unspecified behavior of how the compiler will layout the bit-field.
In short, bit-fields are not specified well enough to make them useful for the situations they seem to be most commonly used for.
Solution 3:
ISO/IEC 9899: 6.7.2.1 / 10
An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
It is safer to use bit shift operations instead of making any assumptions on bit field ordering or alignment when trying to write portable code, regardless of system endianness or bitness.
Also see EXP11-C. Do not apply operators expecting one type to data of an incompatible type.
Solution 4:
Bit field accesses are implemented in terms of operations on the underlying type. In the example, unsigned int
. So if you have something like:
struct x {
unsigned int a : 4;
unsigned int b : 8;
unsigned int c : 4;
};
When you access field b
, the compiler accesses an entire unsigned int
and then shifts and masks the appropriate bit range. (Well, it doesn't have to, but we can pretend that it does.)
On big endian, layout will be something like this (most significant bit first):
AAAABBBB BBBBCCCC
On little endian, layout will be like this:
BBBBAAAA CCCCBBBB
If you want to access the big endian layout from little endian or vice versa, you'll have to do some extra work. This increase in portability has a performance penalty, and since struct layout is already non-portable, language implementors went with the faster version.
This makes a lot of assumptions. Also note that sizeof(struct x) == 4
on most platforms.