Translate every element in numpy array according to key

I am trying to translate every element of a numpy.array according to a given key:

For example:

a = np.array([[1,2,3],

my_dict = {1:23, 2:34, 3:36, 4:45}

I want to get:

array([[ 23.,  34.,  36.],
       [ 36.,  34.,  45.]])

I can see how to do it with a loop:

def loop_translate(a, my_dict):
    new_a = np.empty(a.shape)
    for i,row in enumerate(a):
        new_a[i,:] = map(my_dict.get, row)
    return new_a

Is there a more efficient and/or pure numpy way?


I timed it, and np.vectorize method proposed by DSM is considerably faster for larger arrays:

In [13]: def loop_translate(a, my_dict):
   ....:     new_a = np.empty(a.shape)
   ....:     for i,row in enumerate(a):
   ....:         new_a[i,:] = map(my_dict.get, row)
   ....:     return new_a

In [14]: def vec_translate(a, my_dict):    
   ....:     return np.vectorize(my_dict.__getitem__)(a)

In [15]: a = np.random.randint(1,5, (4,5))

In [16]: a
array([[2, 4, 3, 1, 1],
       [2, 4, 3, 2, 4],
       [4, 2, 1, 3, 1],
       [2, 4, 3, 4, 1]])

In [17]: %timeit loop_translate(a, my_dict)
10000 loops, best of 3: 77.9 us per loop

In [18]: %timeit vec_translate(a, my_dict)
10000 loops, best of 3: 70.5 us per loop

In [19]: a = np.random.randint(1, 5, (500,500))

In [20]: %timeit loop_translate(a, my_dict)
1 loops, best of 3: 298 ms per loop

In [21]: %timeit vec_translate(a, my_dict)
10 loops, best of 3: 37.6 ms per loop

In [22]:  %timeit loop_translate(a, my_dict)

Solution 1:

I don't know about efficient, but you could use np.vectorize on the .get method of dictionaries:

>>> a = np.array([[1,2,3],
>>> my_dict = {1:23, 2:34, 3:36, 4:45}
>>> np.vectorize(my_dict.get)(a)
array([[23, 34, 36],
       [36, 34, 45]])

Solution 2:

Here's another approach, using numpy.unique:

>>> a = np.array([[1,2,3],[3,2,1]])
>>> a
array([[1, 2, 3],
       [3, 2, 1]])
>>> d = {1 : 11, 2 : 22, 3 : 33}
>>> u,inv = np.unique(a,return_inverse = True)
>>> np.array([d[x] for x in u])[inv].reshape(a.shape)
array([[11, 22, 33],
       [33, 22, 11]])

This approach is much faster than np.vectorize approach when the number of unique elements in array is small. Explanaion: Python is slow, in this approach the in-python loop is used to convert unique elements, afterwards we rely on extremely optimized numpy indexing operation (done in C) to do the mapping. Hence, if the number of unique elements is comparable to the overall size of the array then there will be no speedup. On the other hand, if there is just a few unique elements, then you can observe a speedup of up to x100.

Solution 3:

I think it'd be better to iterate over the dictionary, and set values in all the rows and columns "at once":

>>> a = np.array([[1,2,3],[3,2,1]])
>>> a
array([[1, 2, 3],
       [3, 2, 1]])
>>> d = {1 : 11, 2 : 22, 3 : 33}
>>> for k,v in d.iteritems():
...     a[a == k] = v
>>> a
array([[11, 22, 33],
       [33, 22, 11]])


While it may not be as sexy as DSM's (really good) answer using numpy.vectorize, my tests of all the proposed methods show that this approach (using @jamylak's suggestion) is actually a bit faster:

from __future__ import division
import numpy as np
a = np.random.randint(1, 5, (500,500))
d = {1 : 11, 2 : 22, 3 : 33, 4 : 44}

def unique_translate(a,d):
    u,inv = np.unique(a,return_inverse = True)
    return np.array([d[x] for x in u])[inv].reshape(a.shape)

def vec_translate(a, d):    
    return np.vectorize(d.__getitem__)(a)

def loop_translate(a,d):
    n = np.ndarray(a.shape)
    for k in d:
        n[a == k] = d[k]
    return n

def orig_translate(a, d):
    new_a = np.empty(a.shape)
    for i,row in enumerate(a):
        new_a[i,:] = map(d.get, row)
    return new_a

if __name__ == '__main__':
    import timeit
    n_exec = 100
    print 'orig'
    print timeit.timeit("orig_translate(a,d)", 
                        setup="from __main__ import np,a,d,orig_translate",
                        number = n_exec) / n_exec
    print 'unique'
    print timeit.timeit("unique_translate(a,d)", 
                        setup="from __main__ import np,a,d,unique_translate",
                        number = n_exec) / n_exec
    print 'vec'
    print timeit.timeit("vec_translate(a,d)",
                        setup="from __main__ import np,a,d,vec_translate",
                        number = n_exec) / n_exec
    print 'loop'
    print timeit.timeit("loop_translate(a,d)",
                        setup="from __main__ import np,a,d,loop_translate",
                        number = n_exec) / n_exec

