Numpy: find first index of value fast
How can I find the index of the first occurrence of a number in a Numpy array? Speed is important to me. I am not interested in the following answers because they scan the whole array and don't stop when they find the first occurrence:
itemindex = numpy.where(array==item)[0][0]
nonzero(array == item)[0][0]
Note 1: none of the answers from that question seem relevant Is there a Numpy function to return the first index of something in an array?
Note 2: using a C-compiled method is preferred to a Python loop.
There is a feature request for this scheduled for Numpy 2.0.0: https://github.com/numpy/numpy/issues/2269
Although it is way too late for you, but for future reference: Using numba (1) is the easiest way until numpy implements it. If you use anaconda python distribution it should already be installed. The code will be compiled so it will be fast.
@jit(nopython=True)
def find_first(item, vec):
"""return the index of the first occurence of item in vec"""
for i in xrange(len(vec)):
if item == vec[i]:
return i
return -1
and then:
>>> a = array([1,7,8,32])
>>> find_first(8,a)
2
I've made a benchmark for several methods:
argwhere
-
nonzero
as in the question -
.tostring()
as in @Rob Reilink's answer - python loop
- Fortran loop
The Python and Fortran code are available. I skipped the unpromising ones like converting to a list.
The results on log scale. X-axis is the position of the needle (it takes longer to find if it's further down the array); last value is a needle that's not in the array. Y-axis is the time to find it.
The array had 1 million elements and tests were run 100 times. Results still fluctuate a bit, but the qualitative trend is clear: Python and f2py quit at the first element so they scale differently. Python gets too slow if the needle is not in the first 1%, whereas f2py
is fast (but you need to compile it).
To summarize, f2py is the fastest solution, especially if the needle appears fairly early.
It's not built in which is annoying, but it's really just 2 minutes of work. Add this to a file called search.f90
:
subroutine find_first(needle, haystack, haystack_length, index)
implicit none
integer, intent(in) :: needle
integer, intent(in) :: haystack_length
integer, intent(in), dimension(haystack_length) :: haystack
!f2py intent(inplace) haystack
integer, intent(out) :: index
integer :: k
index = -1
do k = 1, haystack_length
if (haystack(k)==needle) then
index = k - 1
exit
endif
enddo
end
If you're looking for something other than integer
, just change the type. Then compile using:
f2py -c -m search search.f90
after which you can do (from Python):
import search
print(search.find_first.__doc__)
a = search.find_first(your_int_needle, your_int_array)
You can convert a boolean array to a Python string using array.tostring()
and then using the find() method:
(array==item).tostring().find('\x01')
This does involve copying the data, though, since Python strings need to be immutable. An advantage is that you can also search for e.g. a rising edge by finding \x00\x01
In case of sorted arrays np.searchsorted
works.