Is "x < y < z" faster than "x < y and y < z"?

Solution 1:

The difference is that in x < y < z y is only evaluated once. This does not make a large difference if y is a variable, but it does when it is a function call, which takes some time to compute.

from time import sleep
def y():
    sleep(.2)
    return 1.3
%timeit 1.2 < y() < 1.8
10 loops, best of 3: 203 ms per loop
%timeit 1.2 < y() and y() < 1.8
1 loops, best of 3: 405 ms per loop

Solution 2:

Optimal bytecode for both of the functions you defined would be

          0 LOAD_CONST               0 (None)
          3 RETURN_VALUE

because the result of the comparison is not used. Let's make the situation more interesting by returning the result of the comparison. Let's also have the result not be knowable at compile time.

def interesting_compare(y):
    x = 1.1
    z = 1.3
    return x < y < z  # or: x < y and y < z

Again, the two versions of the comparison are semantically identical, so the optimal bytecode is the same for both constructs. As best I can work it out, it would look like this. I've annotated each line with the stack contents before and after each opcode, in Forth notation (top of stack at right, -- divides before and after, trailing ? indicates something that might or might not be there). Note that RETURN_VALUE discards everything that happens to be left on the stack underneath the value returned.

          0 LOAD_FAST                0 (y)    ;          -- y
          3 DUP_TOP                           ; y        -- y y
          4 LOAD_CONST               0 (1.1)  ; y y      -- y y 1.1
          7 COMPARE_OP               4 (>)    ; y y 1.1  -- y pred
         10 JUMP_IF_FALSE_OR_POP     19       ; y pred   -- y
         13 LOAD_CONST               1 (1.3)  ; y        -- y 1.3
         16 COMPARE_OP               0 (<)    ; y 1.3    -- pred
     >>  19 RETURN_VALUE                      ; y? pred  --

If an implementation of the language, CPython, PyPy, whatever, does not generate this bytecode (or its own equivalent sequence of operations) for both variations, that demonstrates the poor quality of that bytecode compiler. Getting from the bytecode sequences you posted to the above is a solved problem (I think all you need for this case is constant folding, dead code elimination, and better modeling of the contents of the stack; common subexpression elimination would also be cheap and valuable), and there's really no excuse for not doing it in a modern language implementation.

Now, it happens that all current implementations of the language have poor-quality bytecode compilers. But you should ignore that while coding! Pretend the bytecode compiler is good, and write the most readable code. It will probably be plenty fast enough anyway. If it isn't, look for algorithmic improvements first, and give Cython a try second -- that will provide far more improvement for the same effort than any expression-level tweaks you might apply.

Solution 3:

Since the difference in the output seem to be due to lack of optimization I think you should ignore that difference for most cases - it could be that the difference will go away. The difference is because y only should be evaluated once and that is solved by duplicating it on the stack which requires an extra POP_TOP - the solution to use LOAD_FAST might be possible though.

The important difference though is that in x<y and y<z the second y should be evaluated twice if x<y evaluates to true, this has implications if the evaluation of y takes considerable time or have side effects.

In most scenarios you should use x<y<z despite the fact it's somewhat slower.

Is "x < y < z" faster than "x < y and y < z"?

Solution 1:

Solution 2:

Solution 3:

Related

Recent Posts