In numpy, how to efficiently build a mapping from each unique value to its indices, without using a for loop
In numpy, how to efficiently build a mapping from each unique value to its indices, without using a for loop
I considered the following alternatives, but they are not efficient enough for my use case because I use large arrays.
The first alternative, requires traversing the array with for
loop, which may be slow when considering large numpy arrays.
import numpy as np
from collections import defaultdict
a = np.array([1, 2, 6, 4, 2, 3, 2])
inv = defaultdict(list)
for i, x in enumerate(a):
inv[x].append(i)
The second alternative is non-efficient because it requires travesing the array multiple times:
import numpy as np
a = np.array([1, 2, 6, 4, 2, 3, 2])
inv = {}
for x in np.unique(a):
inv[x] = np.flatnonzero(a == x)
EDIT: My numpy array consists of integers and the usage is for image segmentation. I was also looking for a method in skimage, but did not find any.
This should work.
a = np.array((1, 2, 6, 2, 4, 7, 25, 6))
fwd = np.argsort(a)
asorted = a[fwd]
keys = np.unique(asorted)
lower = np.searchsorted(asorted, keys)
# or higher = itertools.chain(lower[1:], (len(asorted),))
higher = np.append(lower[1:], len(asorted))
inv = {key: fwd[lower_i:higher_i]
for key, lower_i, higher_i
in zip(keys, lower, higher)}
assert all(np.all(a[indices] == key)
for key, indices in inv.items())
It runs in something like O(n log(n)). The only loop that remains is the one to build a dictionary. That step is optional, of course.
From a purely algorithmic standpoint, your first approach (defaultdict(list)) would be better since it runs in aggregated linear time but of course the python overhead may be significant.
I advise you to check out numba
which can speed up numpy
code on python significantly - it supports numpy.invert()
and numpy.unique()
- documentation
Here is a good video explaining how to use numba
from youtube - Make Python code 1000x Faster with Numba