Why is math.sqrt massively slower than exponentiation?
Python 3 is precomputing the value of 2 ** 0.5
at compile time, since both operands are known at that time. The value of sqrt
, however, is not known at compile time, so the computation necessarily occurs at run time.
You aren't timing how long it takes to compute 2 ** 0.5
, but just the time it takes to load a constant.
A fairer comparison would be
$ python3 -m timeit -s "from math import sqrt" "sqrt(2)"
5000000 loops, best of 5: 50.7 nsec per loop
$ python3 -m timeit -s "x = 2" "x**0.5"
5000000 loops, best of 5: 56.7 nsec per loop
I'm not sure if there is a way to show unoptimized byte code. Python starts by parsing source code into an abstract syntax tree (AST):
>>> ast.dump(ast.parse("2**0.5"))
'Module(body=[Expr(value=BinOp(left=Num(n=2), op=Pow(), right=Num(n=0.5)))])'
Update: This particular optimization is now applied directly to the abstract syntax tree, so the byte code is generated directly from something like
Module(body=Num(n= 1.4142135623730951))
The ast
module doesn't appear to apply the optimization.
The compiler takes the AST and generates unoptimized byte code; in this case, I believe it would look (based on the output of dis.dis("2**x")
and dis.dis("x**0.5")
) like
LOAD_CONST 0 (2)
LOAD_CONST 1 (0.5)
BINARY_POWER
RETURN_VALUE
The raw byte code is then subject to modification by the peephole optimzizer, which can reduce these 4 instructions to 2, as shown by the dis
module.
The compiler then generates byte code from the AST.
>>> dis.dis("2**0.5")
1 0 LOAD_CONST 0 (1.4142135623730951)
2 RETURN_VALUE
[While the following paragraph was originally written with the idea of optimizing byte code in mind, the reasoning applies to optimizing the AST as well.]
Since nothing at runtime affects how the two LOAD_CONST
and following BINARY_POWER
instruction are evaluated (for example, there are no name lookups), the peephole optimizer can take this sequence of byte codes, perform the computation of 2**0.5
itself, and replace the first three instructions with a single LOAD_CONST
instruction that loads the result immediately.
To enhance chepner's answer, here's a proof:
Python 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> dis.dis('2 ** 0.5')
1 0 LOAD_CONST 2 (1.4142135623730951)
3 RETURN_VALUE
vs.
>>> dis.dis('sqrt(2)')
1 0 LOAD_NAME 0 (sqrt)
3 LOAD_CONST 0 (2)
6 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
9 RETURN_VALUE
>>> dis.dis('44442.3123 ** 0.5')
0 LOAD_CONST 0 (210.81345379268373)
2 RETURN_VALUE
I do not believe, that 44442.3123 ** 0.5
is precomputed at compile time. We should better check the AST of code.
>>> import ast
>>> import math
>>> code = ast.parse("2**2")
>>> ast.dump(code)
'Module(body=[Expr(value=BinOp(left=Num(n=2), op=Pow(), right=Num(n=2)))])'
>>> code = ast.parse("math.sqrt(3)")
>>> ast.dump(code)
"Module(body=[Expr(value=Call(func=Attribute(value=Name(id='math', ctx=Load()), attr='sqrt', ctx=Load()), args=[Num(n=3)], keywords=[]))])"