How to extend an array in-place in Numpy?

Currently, I have some code like this

import numpy as np
ret = np.array([])
for i in range(100000):
  tmp =  get_input(i)
  ret = np.append(ret, np.zeros(len(tmp)))
  ret = np.append(ret, np.ones(fixed_length))

I think this code is not efficient as np.append needs to return a copy of the array instead of modify the ret in-place

I was wondering whether I can use the extend for a numpy array like this:

import numpy as np
from somewhere import np_extend
ret = np.array([])
for i in range(100000):
  tmp =  get_input(i)
  np_extend(ret, np.zeros(len(tmp)))
  np_extend(ret, np.ones(fixed_length))

So that the extend would be much more efficient. Does anyone have ideas about this? Thanks!


Solution 1:

Imagine a numpy array as occupying one contiguous block of memory. Now imagine other objects, say other numpy arrays, which are occupying the memory just to the left and right of our numpy array. There would be no room to append to or extend our numpy array. The underlying data in a numpy array always occupies a contiguous block of memory.

So any request to append to or extend our numpy array can only be satisfied by allocating a whole new larger block of memory, copying the old data into the new block and then appending or extending.

So:

  1. It will not occur in-place.
  2. It will not be efficient.

Solution 2:

You can use the .resize() method of ndarrays. It requires that the memory is not referred to by other arrays/variables.

import numpy as np
ret = np.array([])
for i in range(100):
    tmp = np.random.rand(np.random.randint(1, 100))
    ret.resize(len(ret) + len(tmp)) # <- ret is not referred to by anything else,
                                    #    so this works
    ret[-len(tmp):] = tmp

The efficiency can be improved by using the usual array memory overrallocation schemes.