What is the size of a boolean In C#? Does it really take 4-bytes?
I have two structs with arrays of bytes and booleans:
using System.Runtime.InteropServices;
[StructLayout(LayoutKind.Sequential, Pack = 4)]
struct struct1
{
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
public byte[] values;
}
[StructLayout(LayoutKind.Sequential, Pack = 4)]
struct struct2
{
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
public bool[] values;
}
And the following code:
class main
{
public static void Main()
{
Console.WriteLine("sizeof array of bytes: "+Marshal.SizeOf(typeof(struct1)));
Console.WriteLine("sizeof array of bools: " + Marshal.SizeOf(typeof(struct2)));
Console.ReadKey();
}
}
That gives me the following output:
sizeof array of bytes: 3
sizeof array of bools: 12
It seems to be that a boolean
takes 4 bytes of storage. Ideally a boolean
would only take one bit (false
or true
, 0
or 1
, etc..).
What is happening here? Is the boolean
type really so inefficient?
The bool type has a checkered history with many incompatible choices between language runtimes. This started with an historical design-choice made by Dennis Ritchie, the guy that invented the C language. It did not have a bool type, the alternative was int where a value of 0 represents false and any other value was considered true.
This choice was carried forward in the Winapi, the primary reason to use pinvoke, it has a typedef for BOOL
which is an alias for the C compiler's int keyword. If you don't apply an explicit [MarshalAs] attribute then a C# bool is converted to a BOOL, thus producing a field that is 4 bytes long.
Whatever you do, your struct declaration needs to be a match with the runtime choice made in the language you interop with. As noted, BOOL for the winapi but most C++ implementations chose byte, most COM Automation interop uses VARIANT_BOOL which is a short.
The actual size of a C# bool
is one byte. A strong design-goal of the CLR is that you cannot find out. Layout is an implementation detail that depends on the processor too much. Processors are very picky about variable types and alignment, wrong choices can significantly affect performance and cause runtime errors. By making the layout undiscoverable, .NET can provide a universal type system that does not depend on the actual runtime implementation.
In other words, you always have to marshal a structure at runtime to nail down the layout. At which time the conversion from the internal layout to the interop layout is made. That can be very fast if the layout is identical, slow when fields need to be re-arranged since that always requires creating a copy of the struct. The technical term for this is blittable, passing a blittable struct to native code is fast because the pinvoke marshaller can simply pass a pointer.
Performance is also the core reason why a bool is not a single bit. There are few processors that make a bit directly addressable, the smallest unit is a byte. An extra instruction is required to fish the bit out of the byte, that doesn't come for free. And it is never atomic.
The C# compiler isn't otherwise shy about telling you that it takes 1 byte, use sizeof(bool)
. This is still not a fantastic predictor for how many bytes a field takes at runtime, the CLR also needs to implement the .NET memory model and it promises that simple variable updates are atomic. That requires variables to be properly aligned in memory so the processor can update it with a single memory-bus cycle. Pretty often, a bool actually requires 4 or 8 bytes in memory because of this. Extra padding that was added to ensure that the next member is aligned properly.
The CLR actually takes advantage of layout being undiscoverable, it can optimize the layout of a class and re-arrange the fields so the padding is minimized. So, say, if you have a class with a bool + int + bool member then it would take 1 + (3) + 4 + 1 + (3) bytes of memory, (3) is the padding, for a total of 12 bytes. 50% waste. Automatic layout rearranges to 1 + 1 + (2) + 4 = 8 bytes. Only a class has automatic layout, structs have sequential layout by default.
More bleakly, a bool can require as many as 32 bytes in a C++ program compiled with a modern C++ compiler that supports the AVX instruction set. Which imposes a 32-byte alignment requirement, the bool variable may end up with 31 bytes of padding. Also the core reason why a .NET jitter does not emit SIMD instructions, unless explicitly wrapped, it can't get the alignment guarantee.
Firstly, this is only the size for interop. It doesn't represent the size in managed code of the array. That's 1 byte per bool
- at least on my machine. You can test it for yourself with this code:
using System;
class Program
{
static void Main(string[] args)
{
int size = 10000000;
object array = null;
long before = GC.GetTotalMemory(true);
array = new bool[size];
long after = GC.GetTotalMemory(true);
double diff = after - before;
Console.WriteLine("Per value: " + diff / size);
// Stop the GC from messing up our measurements
GC.KeepAlive(array);
}
}
Now, for marshalling arrays by value, as you are, the documentation says:
When the MarshalAsAttribute.Value property is set to
ByValArray
, the SizeConst field must be set to indicate the number of elements in the array. TheArraySubType
field can optionally contain theUnmanagedType
of the array elements when it is necessary to differentiate among string types. You can use thisUnmanagedType
only on an array that whose elements appear as fields in a structure.
So we look at ArraySubType
, and that has documentation of:
You can set this parameter to a value from the
UnmanagedType
enumeration to specify the type of the array's elements. If a type is not specified, the default unmanaged type corresponding to the managed array's element type is used.
Now looking at UnmanagedType
, there's:
Bool
A 4-byte Boolean value (true != 0, false = 0). This is the Win32 BOOL type.
So that's the default for bool
, and it's 4 bytes because that corresponds to the Win32 BOOL type - so if you're interoperating with code expecting a BOOL
array, it does exactly what you want.
Now you can specify the ArraySubType
as I1
instead, which is documented as:
A 1-byte signed integer. You can use this member to transform a Boolean value into a 1-byte, C-style bool (true = 1, false = 0).
So if the code you're interoperating with expects 1 byte per value, just use:
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 3, ArraySubType = UnmanagedType.I1)]
public bool[] values;
Your code will then show that as taking up 1 byte per value, as expected.