How can I time a code segment for testing performance with Pythons timeit?
I've a python script which works just as it should, but I need to write the execution time. I've googled that I should use timeit
but I can't seem to get it to work.
My Python script looks like this:
import sys
import getopt
import timeit
import random
import os
import re
import ibm_db
import time
from string import maketrans
myfile = open("results_update.txt", "a")
for r in range(100):
rannumber = random.randint(0, 100)
update = "update TABLE set val = %i where MyCount >= '2010' and MyCount < '2012' and number = '250'" % rannumber
#print rannumber
conn = ibm_db.pconnect("dsn=myDB","usrname","secretPWD")
for r in range(5):
print "Run %s\n" % r
ibm_db.execute(query_stmt)
query_stmt = ibm_db.prepare(conn, update)
myfile.close()
ibm_db.close(conn)
What I need is the time it takes to execute the query and write it to the file results_update.txt
. The purpose is to test an update statement for my database with different indexes and tuning mechanisms.
You can use time.time()
or time.clock()
before and after the block you want to time.
import time
t0 = time.time()
code_block
t1 = time.time()
total = t1-t0
This method is not as exact as timeit
(it does not average several runs) but it is straightforward.
time.time()
(in Windows and Linux) and time.clock()
(in Linux) are not precise enough for fast functions (you get total = 0). In this case or if you want to average the time elapsed by several runs, you have to manually call the function multiple times (As I think you already do in you example code and timeit does automatically when you set its number argument)
import time
def myfast():
code
n = 10000
t0 = time.time()
for i in range(n): myfast()
t1 = time.time()
total_n = t1-t0
In Windows, as Corey stated in the comment, time.clock()
has much higher precision (microsecond instead of second) and is preferred over time.time()
.
If you are profiling your code and can use IPython, it has the magic function %timeit
.
%%timeit
operates on cells.
In [2]: %timeit cos(3.14)
10000000 loops, best of 3: 160 ns per loop
In [3]: %%timeit
...: cos(3.14)
...: x = 2 + 3
...:
10000000 loops, best of 3: 196 ns per loop
Quite apart from the timing, this code you show is simply incorrect: you execute 100 connections (completely ignoring all but the last one), and then when you do the first execute call you pass it a local variable query_stmt
which you only initialize after the execute call.
First, make your code correct, without worrying about timing yet: i.e. a function that makes or receives a connection and performs 100 or 500 or whatever number of updates on that connection, then closes the connection. Once you have your code working correctly is the correct point at which to think about using timeit
on it!
Specifically, if the function you want to time is a parameter-less one called foobar
you can use timeit.timeit (2.6 or later -- it's more complicated in 2.5 and before):
timeit.timeit('foobar()', number=1000)
Since 3.5 the globals
parameter makes it straightforward to use timeit
it with functions that take parameters:
timeit.timeit('foobar(x,y)', number=1000, globals = globals())
You'd better specify the number of runs because the default, a million, may be high for your use case (leading to spending a lot of time in this code;-).
Focus on one specific thing. Disk I/O is slow, so I'd take that out of the test if all you are going to tweak is the database query.
And if you need to time your database execution, look for database tools instead, like asking for the query plan, and note that performance varies not only with the exact query and what indexes you have, but also with the data load (how much data you have stored).
That said, you can simply put your code in a function and run that function with timeit.timeit()
:
def function_to_repeat():
# ...
duration = timeit.timeit(function_to_repeat, number=1000)
This would disable the garbage collection, repeatedly call the function_to_repeat()
function, and time the total duration of those calls using timeit.default_timer()
, which is the most accurate available clock for your specific platform.
You should move setup code out of the repeated function; for example, you should connect to the database first, then time only the queries. Use the setup
argument to either import or create those dependencies, and pass them into your function:
def function_to_repeat(var1, var2):
# ...
duration = timeit.timeit(
'function_to_repeat(var1, var2)',
'from __main__ import function_to_repeat, var1, var2',
number=1000)
would grab the globals function_to_repeat
, var1
and var2
from your script and pass those to the function each repetition.
Here's a simple wrapper for steven's answer. This function doesn't do repeated runs/averaging, just saves you from having to repeat the timing code everywhere :)
'''function which prints the wall time it takes to execute the given command'''
def time_func(func, *args): #*args can take 0 or more
import time
start_time = time.time()
func(*args)
end_time = time.time()
print("it took this long to run: {}".format(end_time-start_time))