Optimising np.searchsorted() with cython or alternative
I have the following functions. I was trying to optimize the np.searchsorted() function, and got a good result writing my own in Cython. But then realized simply wrapping the former in a cdef cfunc() works better, and I don't understand the mechanics of python, cython or numpy to know why. What would be the best, and fastest/efficient way to write this function. In my script again this is one of the functions that is called several thousand times during some computations, and I'd like the cost per call to be as low as possible. Also, additionally I don't need to duplicate np.searchsorted() entirely, I just need to produce the specific output i = np.searchsorted(time_points > t, True)
Case 1
%%cython --compile-args=-fopenmp --link-args=-fopenmp -a
cimport openmp
cimport cython
#cimport openmp
"""
%%cython -a
"""
from cpython cimport array
import array
import cython.parallel as cp
from cython.parallel import parallel, prange
import numpy as np
cimport numpy as np
import random
import pickle
import matplotlib.pyplot as pl
from timeit import default_timer as timer
cdef int ntime = 10**4
cdef float t = 2 #some variable in general
time_points = np.arange(ntime, dtype=np.float64)
#print(time_points)
#search_time =[]
start_update = timer()
i = np.searchsorted(time_points > t, True)
print(i)
end_update = timer()
print(end_update-start_update)
Case 2
@cython.boundscheck(False) # Deactivate bounds checking
@cython.wraparound(False) # Deactivate negative indexing.
@cython.cdivision(True) # Deactivate division by 0 checking.
cdef mysearch(np.ndarray[np.float64_t, ndim=1] time_points, float val, int nt):
cdef int idx
cdef int return_val
#cdef int total
if val >= nt or nt-1< val<nt:
print("first")
return nt
else:
for idx in range(nt):
#print(idx)
if time_points[idx] <= val:
#return_val = np.int(time_points[idx])
#break
#print("sec")
continue
else:
#print("third", idx)
return_val = np.int(time_points[idx])
#continue
break
return return_val
print(mysearch(time_points, t, ntime))
#i = np.searchsorted(time_points > 2, True)
#print(i)
end_update = timer()
b= end_update-start_update
print(b)
Case 3
start_update = timer()
cdef cfunc():
i = np.searchsorted(time_points > t, True)
print(i, np.searchsorted(time_points > t, True))
return i
end_update = timer()
a= end_update-start_update
print(a)
The times for case 1, case 2 and case 3 are 0.00013096071779727936
5.245860666036606e-05
and 6.770715117454529e-07
I am happy that case 2 works better than case 1, but why is case 3 so much faster? Is something special going on, and how can i beat those speeds?
Solution 1:
The main problem with np.searchsorted(time_points > t, True)
is time_points > t
. Indeed, while np.searchsorted
runs in O(log n)
time (with n
the size of time_points
), the expression time_points > t
is computed in O(n)
time and is clearly the bottleneck. In fact, you do not need this sub-expression: i = np.searchsorted(time_points, t, 'right')
should do the job correctly. Note that the last case do not execute anything between the two timers as explained by @DavidW in the comments: it just define a Cython function.