Converting int to bytes in Python 3
I was trying to build this bytes object in Python 3:
b'3\r\n'
so I tried the obvious (for me), and found a weird behaviour:
>>> bytes(3) + b'\r\n'
b'\x00\x00\x00\r\n'
Apparently:
>>> bytes(10)
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
I've been unable to see any pointers on why the bytes conversion works this way reading the documentation. However, I did find some surprise messages in this Python issue about adding format
to bytes (see also Python 3 bytes formatting):
http://bugs.python.org/issue3982
This interacts even more poorly with oddities like bytes(int) returning zeroes now
and:
It would be much more convenient for me if bytes(int) returned the ASCIIfication of that int; but honestly, even an error would be better than this behavior. (If I wanted this behavior - which I never have - I'd rather it be a classmethod, invoked like "bytes.zeroes(n)".)
Can someone explain me where this behaviour comes from?
From python 3.2 you can do
>>> (1024).to_bytes(2, byteorder='big')
b'\x04\x00'
https://docs.python.org/3/library/stdtypes.html#int.to_bytes
def int_to_bytes(x: int) -> bytes:
return x.to_bytes((x.bit_length() + 7) // 8, 'big')
def int_from_bytes(xbytes: bytes) -> int:
return int.from_bytes(xbytes, 'big')
Accordingly, x == int_from_bytes(int_to_bytes(x))
.
Note that the above encoding works only for unsigned (non-negative) integers.
For signed integers, the bit length is a bit more tricky to calculate:
def int_to_bytes(number: int) -> bytes:
return number.to_bytes(length=(8 + (number + (number < 0)).bit_length()) // 8, byteorder='big', signed=True)
def int_from_bytes(binary_data: bytes) -> Optional[int]:
return int.from_bytes(binary_data, byteorder='big', signed=True)
That's the way it was designed - and it makes sense because usually, you would call bytes
on an iterable instead of a single integer:
>>> bytes([3])
b'\x03'
The docs state this, as well as the docstring for bytes
:
>>> help(bytes)
...
bytes(int) -> bytes object of size given by the parameter initialized with null bytes
You can use the struct's pack:
In [11]: struct.pack(">I", 1)
Out[11]: '\x00\x00\x00\x01'
The ">" is the byte-order (big-endian) and the "I" is the format character. So you can be specific if you want to do something else:
In [12]: struct.pack("<H", 1)
Out[12]: '\x01\x00'
In [13]: struct.pack("B", 1)
Out[13]: '\x01'
This works the same on both python 2 and python 3.
Note: the inverse operation (bytes to int) can be done with unpack.