Why do I get an int when I index bytes?
I'm trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int
:
>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>
This seems unintuitive to me, as I was expecting to get b'j'
.
I have discovered that I can get the value I expect, but it feels like a hack to me.
>>> my_bytes[0:1]
b'j'
Can someone please explain why this happens?
The bytes
type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.
From the documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that
0 <= x < 256
[.][...]
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object
b
,b[0]
will be an integer, whileb[0:1]
will be abytes
object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).
Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0]
gives you a str
object of length one; str
is the only sequence type that contains elements of its own type, always.
This echoes how other languages treat string data; in C the unsigned char
type is also effectively an integer in the range 0-255. Many C compilers default to unsigned
if you use an unqualified char
type, and text is modelled as a char[]
array.