Good ways to "expand" a numpy ndarray?

Are there good ways to "expand" a numpy ndarray? Say I have an ndarray like this:

[[1 2]
 [3 4]]

And I want each row to contains more elements by filling zeros:

[[1 2 0 0 0]
 [3 4 0 0 0]]

I know there must be some brute-force ways to do so (say construct a bigger array with zeros then copy elements from old smaller arrays), just wondering are there pythonic ways to do so. Tried numpy.reshape but didn't work:

import numpy as np
a = np.array([[1, 2], [3, 4]])
np.reshape(a, (2, 5))

Numpy complains that: ValueError: total size of new array must be unchanged


Solution 1:

You can use numpy.pad, as follows:

>>> import numpy as np
>>> a=[[1,2],[3,4]]
>>> np.pad(a, ((0,0),(0,3)), mode='constant', constant_values=0)
array([[1, 2, 0, 0, 0],
       [3, 4, 0, 0, 0]])

Here np.pad says, "Take the array a and add 0 rows above it, 0 rows below it, 0 columns to the left of it, and 3 columns to the right of it. Fill these columns with a constant specified by constant_values".

Solution 2:

There are the index tricks r_ and c_.

>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> z = np.zeros((2, 3), dtype=a.dtype)
>>> np.c_[a, z]
array([[1, 2, 0, 0, 0],
       [3, 4, 0, 0, 0]])

If this is performance critical code, you might prefer to use the equivalent np.concatenate rather than the index tricks.

>>> np.concatenate((a,z), axis=1)
array([[1, 2, 0, 0, 0],
       [3, 4, 0, 0, 0]])

There are also np.resize and np.ndarray.resize, but they have some limitations (due to the way numpy lays out data in memory) so read the docstring on those ones. You will probably find that simply concatenating is better.

By the way, when I've needed to do this I usually just do it the basic way you've already mentioned (create an array of zeros and assign the smaller array inside it), I don't see anything wrong with that!

Solution 3:

Just to be clear: there's no "good" way to extend a NumPy array, as NumPy arrays are not expandable. Once the array is defined, the space it occupies in memory, a combination of the number of its elements and the size of each element, is fixed and cannot be changed. The only thing you can do is to create a new array and replace some of its elements by the elements of the original array.

A lot of functions are available for convenience (the np.concatenate function and its np.*stack shortcuts, the np.column_stack, the indexes routines np.r_ and np.c_...), but there are just that: convenience functions. Some of them are optimized at the C level (the np.concatenate and others, I think), some are not.

Note that there's nothing at all with your initial suggestion of creating a large array 'by hand' (possibly filled with zeros) and filling it yourself with your initial array. It might be more readable that more complicated solutions.