Why is math.sqrt massively slower than exponentiation?

Python 3 is precomputing the value of 2 ** 0.5 at compile time, since both operands are known at that time. The value of sqrt, however, is not known at compile time, so the computation necessarily occurs at run time.

You aren't timing how long it takes to compute 2 ** 0.5, but just the time it takes to load a constant.

A fairer comparison would be

$ python3 -m timeit -s "from math import sqrt" "sqrt(2)"
5000000 loops, best of 5: 50.7 nsec per loop
$ python3 -m timeit -s "x = 2" "x**0.5"
5000000 loops, best of 5: 56.7 nsec per loop

I'm not sure if there is a way to show unoptimized byte code. Python starts by parsing source code into an abstract syntax tree (AST):

>>> ast.dump(ast.parse("2**0.5"))
'Module(body=[Expr(value=BinOp(left=Num(n=2), op=Pow(), right=Num(n=0.5)))])'

Update: This particular optimization is now applied directly to the abstract syntax tree, so the byte code is generated directly from something like

Module(body=Num(n= 1.4142135623730951))

The ast module doesn't appear to apply the optimization.

~~The compiler takes the AST and generates unoptimized byte code; in this case, I believe it would look (based on the output of dis.dis("2**x") and dis.dis("x**0.5")) like~~

LOAD_CONST       0  (2)
LOAD_CONST       1  (0.5)
BINARY_POWER
RETURN_VALUE

The raw byte code is then subject to modification by the peephole optimzizer, which can reduce these 4 instructions to 2, as shown by the dis module.

The compiler then generates byte code from the AST.

>>> dis.dis("2**0.5")
  1           0 LOAD_CONST               0 (1.4142135623730951)
              2 RETURN_VALUE

[While the following paragraph was originally written with the idea of optimizing byte code in mind, the reasoning applies to optimizing the AST as well.]

Since nothing at runtime affects how the two LOAD_CONST and following BINARY_POWER instruction are evaluated (for example, there are no name lookups), the peephole optimizer can take this sequence of byte codes, perform the computation of 2**0.5 itself, and replace the first three instructions with a single LOAD_CONST instruction that loads the result immediately.

To enhance chepner's answer, here's a proof:

Python 3.5.3 (default, Sep 27 2018, 17:25:39) 
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> dis.dis('2 ** 0.5')
  1           0 LOAD_CONST               2 (1.4142135623730951)
              3 RETURN_VALUE

vs.

>>> dis.dis('sqrt(2)')
  1           0 LOAD_NAME                0 (sqrt)
              3 LOAD_CONST               0 (2)
              6 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
              9 RETURN_VALUE

>>> dis.dis('44442.3123 ** 0.5')
          0 LOAD_CONST               0 (210.81345379268373)
          2 RETURN_VALUE

I do not believe, that 44442.3123 ** 0.5 is precomputed at compile time. We should better check the AST of code.

>>> import ast
>>> import math
>>> code = ast.parse("2**2")
>>> ast.dump(code)
'Module(body=[Expr(value=BinOp(left=Num(n=2), op=Pow(), right=Num(n=2)))])'
>>> code = ast.parse("math.sqrt(3)")
>>> ast.dump(code)
"Module(body=[Expr(value=Call(func=Attribute(value=Name(id='math', ctx=Load()), attr='sqrt', ctx=Load()), args=[Num(n=3)], keywords=[]))])"

Why is math.sqrt massively slower than exponentiation?

Related

Recent Posts