Variable's memory size in Python [duplicate]

I am writing Python code to do some big number calculation, and have serious concern about the memory used in the calculation.

Thus, I want to count every bit of each variable.

For example, I have a variable x, which is a big number, and want to count the number of bits for representing x.

The following code is obviously useless:

x=2**1000
len(x)

Thus, I turn to use the following code:

x=2**1000
len(repr(x))

The variable x is (in decimal) is:

10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376

but the above code returns 303

The above long long sequence is of length 302, and so I believe that 303 should be related to the string length only.

So, here comes my original question:

How can I know the memory size of variable x?

One more thing; in C/C++ language, if I define

int z=1;

This means that there are 4 bytes= 32 bits allocated for z, and the bits are arranged as 00..001(31 0's and one 1).

Here, my variable x is huge, I don't know whether it follows the same memory allocation rule?


Solution 1:

Use sys.getsizeof to get the size of an object, in bytes.

>>> from sys import getsizeof
>>> a = 42
>>> getsizeof(a)
12
>>> a = 2**1000
>>> getsizeof(a)
146
>>>

Note that the size and layout of an object is purely implementation-specific. CPython, for example, may use totally different internal data structures than IronPython. So the size of an object may vary from implementation to implementation.

Solution 2:

Regarding the internal structure of a Python long, check sys.int_info (or sys.long_info for Python 2.7).

>>> import sys
>>> sys.int_info
sys.int_info(bits_per_digit=30, sizeof_digit=4)

Python either stores 30 bits into 4 bytes (most 64-bit systems) or 15 bits into 2 bytes (most 32-bit systems). Comparing the actual memory usage with calculated values, I get

>>> import math, sys
>>> a=0
>>> sys.getsizeof(a)
24
>>> a=2**100
>>> sys.getsizeof(a)
40
>>> a=2**1000
>>> sys.getsizeof(a)
160
>>> 24+4*math.ceil(100/30)
40
>>> 24+4*math.ceil(1000/30)
160

There are 24 bytes of overhead for 0 since no bits are stored. The memory requirements for larger values matches the calculated values.

If your numbers are so large that you are concerned about the 6.25% unused bits, you should probably look at the gmpy2 library. The internal representation uses all available bits and computations are significantly faster for large values (say, greater than 100 digits).