Why does an empty string in Python sometimes take up 49 bytes and sometimes 51?
This sounds like something is retrieving the wchar representation of the string object. As of CPython 3.7, the way the CPython Unicode representation works out, an empty string is normally stored in "compact ASCII" representation, and the base data and padding for a compact ASCII string on a 64-bit build works out to 48 bytes, plus one byte of string data (just the null terminator). You can see the relevant header file here.
For now (this is scheduled for removal in 4.0), there is also an option to retrieve a wchar_t representation of a string. On a platform with 2-byte wchar_t, the wchar representation of an empty string is 2 bytes (just the null terminator again). The wchar representation is cached on the string on first access, and str.__sizeof__
accounts for this extra data when it exists, resulting in a 51-byte total.
https://docs.python.org/3.5/library/sys.html#sys.getsizeof
sys
is system specific so it can easily differ. This is often overlooked by everyone. All system specific stuff in python has been dumped in the sys
package for years. For e.g sys.getwindowsversion()
is not portable by definition but it's there. It like the bottomless pit of rejects in the perfect world of cross platform coding. What you see is one of the interesting nuggets of Python.
from getsizeof
docs:
Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.
getsizeof()
calls the object’s__sizeof__
method and adds an additional garbage collector overhead if the object is managed by the garbage collector.
When Garbage collection is in use the OS will add those extra bits. If you read Python and GC Q & A When are objects garbage collected in python? the folks have gone into excruciating detail expounding the GC and how it will affect the memory/refcount and bits blah blah.
I hope that explains where this coming from. If you don't use system
level attributes but more pythonic attributes then you will get consistent sizes.