Different object size of True and False in Python 3
Experimenting with magic methods (__sizeof__
in particular) on different Python objects I stumbled over the following behaviour:
Python 2.7
>>> False.__sizeof__()
24
>>> True.__sizeof__()
24
Python 3.x
>>> False.__sizeof__()
24
>>> True.__sizeof__()
28
What changed in Python 3 that makes the size of True
greater than the size of False
?
It is because bool
is a subclass of int
in both Python 2 and 3.
>>> issubclass(bool, int)
True
But the int
implementation has changed.
In Python 2, int
was the one that was 32 or 64 bits, depending on the system, as opposed to arbitrary-length long
.
In Python 3, int
is arbitrary-length - the long
of Python 2 was renamed to int
and the original Python 2 int
dropped altogether.
In Python 2 you get the exactly same behaviour for long objects 1L
and 0L
:
Python 2.7.15rc1 (default, Apr 15 2018, 21:51:34)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getsizeof(1L)
28
>>> sys.getsizeof(0L)
24
The long
/Python 3 int
is a variable-length object, just like a tuple - when it is allocated, enough memory is allocated to hold all the binary digits required to represent it. The length of the variable part is stored in the object head. 0
requires no binary digits (its variable length is 0), but even 1
spills over, and requires extra digits.
I.e. 0
is represented as binary string of length 0:
<>
and 1 is represented as a 30-bit binary string:
<000000000000000000000000000001>
The default configuration in Python uses 30 bits in a uint32_t
; so 2**30 - 1
still fits in 28 bytes on x86-64, and 2**30
will require 32;
2**30 - 1
will be presented as
<111111111111111111111111111111>
i.e. all 30 value bits set to 1; 2**30 will need more, and it will have internal representation
<000000000000000000000000000001000000000000000000000000000000>
As for True
using 28 bytes instead of 24 - you need not worry. True
is a singleton and therefore only 4 bytes are lost in total in any Python program, not 4 for every usage of True
.
Both True
and False
are longobject
s in CPython:
struct _longobject _Py_FalseStruct = { PyVarObject_HEAD_INIT(&PyBool_Type, 0) { 0 } }; struct _longobject _Py_TrueStruct = { PyVarObject_HEAD_INIT(&PyBool_Type, 1) { 1 } };
You thus can say that a Boolean is a subclass of a python-3.x int
where True
takes as value 1
, and False
takes as value 0
. We thus make a call to PyVarObject_HEAD_INIT
with as type
parameter a reference to PyBool_Type
and with ob_size
as value 0
and 1
respectively.
Now since python-3.x, there is no long
anymore: these have been merged, and the int
object will, depending on the size of the number, take a different value.
If we inspect the source code of the longlobject
type, we see:
/* Long integer representation. The absolute value of a number is equal to SUM(for i=0 through abs(ob_size)-1) ob_digit[i] * 2**(SHIFT*i) Negative numbers are represented with ob_size < 0; zero is represented by ob_size == 0. In a normalized number, ob_digit[abs(ob_size)-1] (the most significant digit) is never zero. Also, in all cases, for all valid i, 0 <= ob_digit[i] <= MASK. The allocation function takes care of allocating extra memory so that ob_digit[0] ... ob_digit[abs(ob_size)-1] are actually available. CAUTION: Generic code manipulating subtypes of PyVarObject has to aware that ints abuse ob_size's sign bit. */ struct _longobject { PyObject_VAR_HEAD digit ob_digit[1]; };
To make a long story short, an _longobject
can be seen as an array of "digits", but you should here see digits not as decimal digits, but as groups of bits that thus can be added, multiplied, etc.
Now as is specified in the comment, it says that:
zero is represented by ob_size == 0.
So in case the value is zero, no digits are added, whereas for small integers (values less than 230 in CPython), it takes one digit, and so on.
In python-2.x, there were two types of representations for numbers, int
s (with a fixed size), you could see this as "one digit", and long
s, with multiple digits. Since a bool
was a subclass of int
, both True
and False
occupied the same space.
I haven't seen CPython code for this, but I believe this has something to do with optimization of integers in Python 3. Probably, as long
was dropped, some optimizations were unified. int
in Python 3 is arbitrary-sized int – the same as long
was in Python 2. As bool
stores in the same way as new int
, it affects both.
Interesting part:
>>> (0).__sizeof__()
24
>>> (1).__sizeof__() # Here one more "block" is allocated
28
>>> (2**30-1).__sizeof__() # This is the maximum integer size fitting into 28
28
+ bytes for object headers should complete the equation.
Take a look at the cpython code for True
and False
Internally it is represented as integer
PyTypeObject PyBool_Type = {
PyVarObject_HEAD_INIT(&PyType_Type, 0)
"bool",
sizeof(struct _longobject),
0,
0, /* tp_dealloc */
0, /* tp_print */
0, /* tp_getattr */
0, /* tp_setattr */
0, /* tp_reserved */
bool_repr, /* tp_repr */
&bool_as_number, /* tp_as_number */
0, /* tp_as_sequence */
0, /* tp_as_mapping */
0, /* tp_hash */
0, /* tp_call */
bool_repr, /* tp_str */
0, /* tp_getattro */
0, /* tp_setattro */
0, /* tp_as_buffer */
Py_TPFLAGS_DEFAULT, /* tp_flags */
bool_doc, /* tp_doc */
0, /* tp_traverse */
0, /* tp_clear */
0, /* tp_richcompare */
0, /* tp_weaklistoffset */
0, /* tp_iter */
0, /* tp_iternext */
0, /* tp_methods */
0, /* tp_members */
0, /* tp_getset */
&PyLong_Type, /* tp_base */
0, /* tp_dict */
0, /* tp_descr_get */
0, /* tp_descr_set */
0, /* tp_dictoffset */
0, /* tp_init */
0, /* tp_alloc */
bool_new, /* tp_new */
};
/* The objects representing bool values False and True */
struct _longobject _Py_FalseStruct = {
PyVarObject_HEAD_INIT(&PyBool_Type, 0)
{ 0 }
};
struct _longobject _Py_TrueStruct = {
PyVarObject_HEAD_INIT(&PyBool_Type, 1)
{ 1 }