How do I create an empty array/matrix in NumPy?
Solution 1:
You have the wrong mental model for using NumPy efficiently. NumPy arrays are stored in contiguous blocks of memory. If you want to add rows or columns to an existing array, the entire array needs to be copied to a new block of memory, creating gaps for the new elements to be stored. This is very inefficient if done repeatedly to build an array.
In the case of adding rows, your best bet is to create an array that is as big as your data set will eventually be, and then assign data to it row-by-row:
>>> import numpy
>>> a = numpy.zeros(shape=(5,2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> a[0] = [1,2]
>>> a[1] = [2,3]
>>> a
array([[ 1., 2.],
[ 2., 3.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
Solution 2:
A NumPy array is a very different data structure from a list and is designed to be used in different ways. Your use of hstack
is potentially very inefficient... every time you call it, all the data in the existing array is copied into a new one. (The append
function will have the same issue.) If you want to build up your matrix one column at a time, you might be best off to keep it in a list until it is finished, and only then convert it into an array.
e.g.
mylist = []
for item in data:
mylist.append(item)
mat = numpy.array(mylist)
item
can be a list, an array or any iterable, as long
as each item
has the same number of elements.
In this particular case (data
is some iterable holding the matrix columns) you can simply use
mat = numpy.array(data)
(Also note that using list
as a variable name is probably not good practice since it masks the built-in type by that name, which can lead to bugs.)
EDIT:
If for some reason you really do want to create an empty array, you can just use numpy.array([])
, but this is rarely useful!
Solution 3:
To create an empty multidimensional array in NumPy (e.g. a 2D array m*n
to store your matrix), in case you don't know m
how many rows you will append and don't care about the computational cost Stephen Simmons mentioned (namely re-buildinging the array at each append), you can squeeze to 0 the dimension to which you want to append to: X = np.empty(shape=[0, n])
.
This way you can use for example (here m = 5
which we assume we didn't know when creating the empty matrix, and n = 2
):
import numpy as np
n = 2
X = np.empty(shape=[0, n])
for i in range(5):
for j in range(2):
X = np.append(X, [[i, j]], axis=0)
print X
which will give you:
[[ 0. 0.]
[ 0. 1.]
[ 1. 0.]
[ 1. 1.]
[ 2. 0.]
[ 2. 1.]
[ 3. 0.]
[ 3. 1.]
[ 4. 0.]
[ 4. 1.]]
Solution 4:
I looked into this a lot because I needed to use a numpy.array as a set in one of my school projects and I needed to be initialized empty... I didn't found any relevant answer here on Stack Overflow, so I started doodling something.
# Initialize your variable as an empty list first
In [32]: x=[]
# and now cast it as a numpy ndarray
In [33]: x=np.array(x)
The result will be:
In [34]: x
Out[34]: array([], dtype=float64)
Therefore you can directly initialize an np array as follows:
In [36]: x= np.array([], dtype=np.float64)
I hope this helps.
Solution 5:
You can use the append function. For rows:
>>> from numpy import *
>>> a = array([10,20,30])
>>> append(a, [[1,2,3]], axis=0)
array([[10, 20, 30],
[1, 2, 3]])
For columns:
>>> append(a, [[15],[15]], axis=1)
array([[10, 20, 30, 15],
[1, 2, 3, 15]])
EDIT
Of course, as mentioned in other answers, unless you're doing some processing (ex. inversion) on the matrix/array EVERY time you append something to it, I would just create a list, append to it then convert it to an array.