Numpy first occurrence of value greater than existing value
I have a 1D array in numpy and I want to find the position of the index where a value exceeds the value in numpy array.
E.g.
aa = range(-10,10)
Find position in aa
where, the value 5
gets exceeded.
Solution 1:
This is a little faster (and looks nicer)
np.argmax(aa>5)
Since argmax
will stop at the first True
("In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned.") and doesn't save another list.
In [2]: N = 10000
In [3]: aa = np.arange(-N,N)
In [4]: timeit np.argmax(aa>N/2)
100000 loops, best of 3: 52.3 us per loop
In [5]: timeit np.where(aa>N/2)[0][0]
10000 loops, best of 3: 141 us per loop
In [6]: timeit np.nonzero(aa>N/2)[0][0]
10000 loops, best of 3: 142 us per loop
Solution 2:
given the sorted content of your array, there is an even faster method: searchsorted.
import time
N = 10000
aa = np.arange(-N,N)
%timeit np.searchsorted(aa, N/2)+1
%timeit np.argmax(aa>N/2)
%timeit np.where(aa>N/2)[0][0]
%timeit np.nonzero(aa>N/2)[0][0]
# Output
100000 loops, best of 3: 5.97 µs per loop
10000 loops, best of 3: 46.3 µs per loop
10000 loops, best of 3: 154 µs per loop
10000 loops, best of 3: 154 µs per loop
Solution 3:
I was also interested in this and I've compared all the suggested answers with perfplot. (Disclaimer: I'm the author of perfplot.)
If you know that the array you're looking through is already sorted, then
numpy.searchsorted(a, alpha)
is for you. It's O(log(n)) operation, i.e., the speed hardly depends on the size of the array. You can't get faster than that.
If you don't know anything about your array, you're not going wrong with
numpy.argmax(a > alpha)
Already sorted:
Unsorted:
Code to reproduce the plot:
import numpy
import perfplot
alpha = 0.5
numpy.random.seed(0)
def argmax(data):
return numpy.argmax(data > alpha)
def where(data):
return numpy.where(data > alpha)[0][0]
def nonzero(data):
return numpy.nonzero(data > alpha)[0][0]
def searchsorted(data):
return numpy.searchsorted(data, alpha)
perfplot.save(
"out.png",
# setup=numpy.random.rand,
setup=lambda n: numpy.sort(numpy.random.rand(n)),
kernels=[argmax, where, nonzero, searchsorted],
n_range=[2 ** k for k in range(2, 23)],
xlabel="len(array)",
)