NumPy array/matrix of mixed types
I'm trying to create a NumPy array/matrix (Nx3) with mixed data types (string, integer, integer). But when I'm appending this matrix by adding some data, I get an error: TypeError: invalid type promotion. Please, can anybody help me to solve this problem?
When I create an array with the sample data, NumPy casts all columns in the matrix to the one 'S' data type. And I can't specify data type for an array, because when i do this res = np.array(["TEXT", 1, 1], dtype='S, i4, i4') - I get an error: TypeError: expected a readable buffer object
templates.py
import numpy as np
from pprint import pprint
test_array = np.zeros((0, 3), dtype='S, i4, i4')
pprint(test_array)
test_array = np.append(test_array, [["TEXT", 1, 1]], axis=0)
pprint(test_array)
print("Array example:")
res = np.array(["TEXT", 1, 1])
pprint(res)
Output:
array([], shape=(0L, 3L),
dtype=[('f0', 'S'), ('f1', '<i4'), ('f2', '<i4')])
Array example:
array(['TEXT', '1', '1'], dtype='|S4')
Error:
Traceback (most recent call last):
File "templates.py", line 5, in <module>
test_array = np.append(test_array, [["TEXT", 1, 1]], axis=0)
File "lib\site-packages\numpy\lib\function_base.py", line 3543, in append
return concatenate((arr, values), axis=axis)
TypeError: invalid type promotion
Your problem is in the data. Try this:
res = np.array(("TEXT", 1, 1), dtype='|S4, i4, i4')
or
res = np.array([("TEXT", 1, 1), ("XXX", 2, 2)], dtype='|S4, i4, i4')
The data has to be a tuple or a list of tuples. Not quite evident form the error message, is it?
Also, please note that the length of the text field has to be specified for the text data to really be saved. If you want to save the text as objects (only references in the array, then:
res = np.array([("TEXT", 1, 1), ("XXX", 2, 2)], dtype='object, i4, i4')
This is often quite useful, as well.
If you're not married to numpy, a pandas DataFrame is perfect for this. Alternatively, you can specify the string field in the array as a python object (dtype='O, i4, i4' as an example). Also append seem to like lists of tuples, not lists of lists. I think it has something to do with mutability of lists, not sure.
First, numpy stores array elements using fixed physical record sizes. So, record objects need to all be the same physical size. For this reason, you need to tell numpy the size of the string or save a pointer to a string stored somewhere else. In a record array, 'S' translates into a zero-length string, and that's probably not what you intended.
The append method actually copies the entire array to a larger physical space to accommodate the new elements. Try, for example:
import numpy as np
mtype = 'S10, i4, i4'
ta = np.zeros((0), dtype=mtype)
print id(ta)
ta = np.append(ta, np.array([('first', 10, 11)], dtype=mtype))
print id(ta)
ta = np.append(ta, np.array([('second', 20, 21)], dtype=mtype))
print id(ta)
Each time you append this way, the copy gets slower because you need to allocate and copy more memory each time it grows. That's why the id returns a different value every time you append. If you want any significant number of records in your array, you are much better off either allocating enough space from the start, or else accumulating the data in lists and then collecting the lists into a numpy structured array when you're done. That also gives you the opportunity to make the string length in mtype as short as possible, while still long enough to hold your longest string.
I think this is what you are trying to accomplish - create an empty array of the desired dtype
, and then add one or more data sets to it. The result will have shape (N,), not (N,3).
As I noted in a comment, np.append
uses np.concatenate
, so I am using that too. Also I have to make both test_array
and x
1d arrays (shape (0,) and (1,) respectively). And the dtype
field is S10
, large enough to contain 'TEXT'.
In [56]: test_array = np.zeros((0,), dtype='S10, i4, i4')
In [57]: x = np.array([("TEST",1,1)], dtype='S10, i4, i4')
In [58]: test_array = np.concatenate((test_array, x))
In [59]: test_array = np.concatenate((test_array, x))
In [60]: test_array
Out[60]:
array([('TEST', 1, 1), ('TEST', 1, 1)],
dtype=[('f0', 'S'), ('f1', '<i4'), ('f2', '<i4')])
Here's an example of building the array from a list of tuples:
In [75]: xl=('test',1,1)
In [76]: np.array([xl]*3,dtype='S10,i4,i4')
Out[76]:
array([('test', 1, 1), ('test', 1, 1), ('test', 1, 1)],
dtype=[('f0', 'S10'), ('f1', '<i4'), ('f2', '<i4')])