Why does defining __getitem__ on a class make it iterable in python?
Why does defining __getitem__ on a class make it iterable?
For instance if I write:
class b:
def __getitem__(self, k):
return k
cb = b()
for k in cb:
print k
I get the output:
0
1
2
3
4
5
6
7
8
...
I would really expect to see an error returned from "for k in cb:"
Iteration's support for __getitem__
can be seen as a "legacy feature" which allowed smoother transition when PEP234 introduced iterability as a primary concept. It only applies to classes without __iter__
whose __getitem__
accepts integers 0, 1, &c, and raises IndexError
once the index gets too high (if ever), typically "sequence" classes coded before __iter__
appeared (though nothing stops you from coding new classes this way too).
Personally, I would rather not rely on this in new code, though it's not deprecated nor is it going away (works fine in Python 3 too), so this is just a matter of style and taste ("explicit is better than implicit" so I'd rather explicitly support iterability rather than rely on __getitem__
supporting it implicitly for me -- but, not a bigge).
If you take a look at PEP234 defining iterators, it says:
1. An object can be iterated over with "for" if it implements
__iter__() or __getitem__().
2. An object can function as an iterator if it implements next().
__getitem__
predates the iterator protocol, and was in the past the only way to make things iterable. As such, it's still supported as a method of iterating. Essentially, the protocol for iteration is:
Check for an
__iter__
method. If it exists, use the new iteration protocol.Otherwise, try calling
__getitem__
with successively larger integer values until it raises IndexError.
(2) used to be the only way of doing this, but had the disadvantage that it assumed more than was needed to support just iteration. To support iteration, you had to support random access, which was much more expensive for things like files or network streams where going forwards was easy, but going backwards would require storing everything. __iter__
allowed iteration without random access, but since random access usually allows iteration anyway, and because breaking backward compatability would be bad, __getitem__
is still supported.
Special methods such as __getitem__
add special behaviors to objects, including iteration.
http://docs.python.org/reference/datamodel.html#object.getitem
"for loops expect that an IndexError will be raised for illegal indexes to allow proper detection of the end of the sequence."
Raise IndexError to signal the end of the sequence.
Your code is basically equivalent to:
i = 0
while True:
try:
yield object[i]
i += 1
except IndexError:
break
Where object is what you're iterating over in the for loop.
This is so for historical reasons. Prior to Python 2.2 __getitem__ was the only way to create a class that could be iterated over with the for loop. In 2.2 the __iter__ protocol was added but to retain backwards compatibility __getitem__ still works in for loops.