How to replace values in numpy matrix columns with values from another array?

Solution 1:

Assuming the computed matrices are big, you can implement a fast parallel version using Numba. This implementation is much faster than the initial solution using pure-Python loops which create many small temporary arrays and an inefficient non-contiguous memory access pattern (eg. a[:,j]). It is also significantly faster than using np.where(a == 7, b, a) due to the huge temporary arrays that needs to be filled and that may not fit in RAM (causing the OS to work with the very slow swap memory). Using multiple threads also provides a big speed up. Here is the code:

import numba as nb

@nb.njit('void(int_[:,::1], int_[::1])', parallel=True)
def compute(a, b):
    n, m = a.shape
    assert b.size == m
    for i in nb.prange(n):
        for j in range(m):
            if a[i,j] == 7:
                a[i,j] = b[j]

Here are results on my 6-core machine on a 100000x1000 matrix (with random 32-bit integers in 0..10):

For loop:  683 ms
np.where:  169 ms
Numba:      37 ms

This version is 18 times faster than the initial version and takes almost no more memory (it works in-place).