Why is early return slower than else?
This is a pure guess, and I haven't figured out an easy way to check whether it is right, but I have a theory for you.
I tried your code and get the same of results, without_else()
is repeatedly slightly slower than with_else()
:
>>> T(lambda : without_else()).repeat()
[0.42015745017874906, 0.3188967452567226, 0.31984281521812363]
>>> T(lambda : with_else()).repeat()
[0.36009842032996175, 0.28962249392031936, 0.2927151355828528]
>>> T(lambda : without_else(True)).repeat()
[0.31709728471076915, 0.3172671387005721, 0.3285821242644147]
>>> T(lambda : with_else(True)).repeat()
[0.30939889008243426, 0.3035132258429485, 0.3046679117038593]
Considering that the bytecode is identical, the only difference is the name of the function. In particular the timing test does a lookup on the global name. Try renaming without_else()
and the difference disappears:
>>> def no_else(param=False):
if param:
return 1
return 0
>>> T(lambda : no_else()).repeat()
[0.3359846013948413, 0.29025818923918223, 0.2921801513879245]
>>> T(lambda : no_else(True)).repeat()
[0.3810395594970828, 0.2969634408842694, 0.2960104566362247]
My guess is that without_else
has a hash collision with something else in globals()
so the global name lookup is slightly slower.
Edit: A dictionary with 7 or 8 keys probably has 32 slots, so on that basis without_else
has a hash collision with __builtins__
:
>>> [(k, hash(k) % 32) for k in globals().keys() ]
[('__builtins__', 8), ('with_else', 9), ('__package__', 15), ('without_else', 8), ('T', 21), ('__name__', 25), ('no_else', 28), ('__doc__', 29)]
To clarify how the hashing works:
__builtins__
hashes to -1196389688 which reduced modulo the table size (32) means it is stored in the #8 slot of the table.
without_else
hashes to 505688136 which reduced modulo 32 is 8 so there's a collision. To resolve this Python calculates:
Starting with:
j = hash % 32
perturb = hash
Repeat this until we find a free slot:
j = (5*j) + 1 + perturb;
perturb >>= 5;
use j % 2**i as the next table index;
which gives it 17 to use as the next index. Fortunately that's free so the loop only repeats once. The hash table size is a power of 2, so 2**i
is the size of the hash table, i
is the number of bits used from the hash value j
.
Each probe into the table can find one of these:
The slot is empty, in that case the probing stops and we know the value is not in the table.
The slot is unused but was used in the past in which case we go try the next value calculated as above.
The slot is full but the full hash value stored in the table isn't the same as the hash of the key we are looking for (that's what happens in the case of
__builtins__
vswithout_else
).The slot is full and has exactly the hash value we want, then Python checks to see if the key and the object we are looking up are the same object (which in this case they will be because short strings that could be identifiers are interned so identical identifiers use the exact same string).
Finally when the slot is full, the hash matches exactly, but the keys are not the identical object, then and only then will Python try comparing them for equality. This is comparatively slow, but in the case of name lookups shouldn't actually happen.