How to get field of nested numpy structured array (advanced indexing)

Solution 1:

I don't know if displaying the resulting array helps you visualize the nesting or not.

In [279]: c = [('x','f8'),('y','f8')]
     ...: A = [('data_string','|S20'),('data_val', c, 2)]
     ...: arr = np.zeros(2, dtype=A)
In [280]: arr
Out[280]: 
array([(b'', [(0., 0.), (0., 0.)]), (b'', [(0., 0.), (0., 0.)])],
      dtype=[('data_string', 'S20'), ('data_val', [('x', '<f8'), ('y', '<f8')], (2,))])

Note how the nesting of () and [] reflects the nesting of the fields.

arr.dtype only has direct access to the top level field names:

In [281]: arr.dtype.names
Out[281]: ('data_string', 'data_val')
In [282]: arr['data_val']
Out[282]: 
array([[(0., 0.), (0., 0.)],
       [(0., 0.), (0., 0.)]], dtype=[('x', '<f8'), ('y', '<f8')])

But having accessed one field, we can then look at its fields:

In [283]: arr['data_val'].dtype.names
Out[283]: ('x', 'y')
In [284]: arr['data_val']['x']
Out[284]: 
array([[0., 0.],
       [0., 0.]])

Record number indexing is separate, and can be multidimensional in the usual sense:

In [285]: arr[1]['data_val']['x'] = [1,2]
In [286]: arr[0]['data_val']['y'] = [3,4]
In [287]: arr
Out[287]: 
array([(b'', [(0., 3.), (0., 4.)]), (b'', [(1., 0.), (2., 0.)])],
      dtype=[('data_string', 'S20'), ('data_val', [('x', '<f8'), ('y', '<f8')], (2,))])

Since the data_val field has a (2,) shape, we can mix/match that index with the (2,) shape of arr:

In [289]: arr['data_val']['x']
Out[289]: 
array([[0., 0.],
       [1., 2.]])
In [290]: arr['data_val']['x'][[0,1],[0,1]]
Out[290]: array([0., 2.])
In [291]: arr['data_val'][[0,1],[0,1]]
Out[291]: array([(0., 3.), (2., 0.)], dtype=[('x', '<f8'), ('y', '<f8')])

I mentioned that fields indexing is like dict indexing. Note this display of the fields:

In [294]: arr.dtype.fields
Out[294]: 
mappingproxy({'data_string': (dtype('S20'), 0),
              'data_val': (dtype(([('x', '<f8'), ('y', '<f8')], (2,))), 20)})

Each record is stored as a block of 52 bytes:

In [299]: arr.itemsize
Out[299]: 52
In [300]: arr.dtype.str
Out[300]: '|V52'

20 of those are data_string, and 32 are the 2 c fields

In [303]: arr['data_val'].dtype.str
Out[303]: '|V16'

You can ask for a list of fields, and get a special kind of view. Its dtype display is a little different

In [306]: arr[['data_val']]
Out[306]: 
array([([(0., 3.), (0., 4.)],), ([(1., 0.), (2., 0.)],)],
      dtype={'names': ['data_val'], 'formats': [([('x', '<f8'), ('y', '<f8')], (2,))], 'offsets': [20], 'itemsize': 52})

In [311]: arr['data_val'][['y']]
Out[311]: 
array([[(3.,), (4.,)],
       [(0.,), (0.,)]],
      dtype={'names': ['y'], 'formats': ['<f8'], 'offsets': [8], 'itemsize': 16})

Each 'data_val' starts 20 bytes into the 52 byte record. And each 'y' starts 8 bytes into its 16 byte record.

Solution 2:

The statement zeros['data_val'] creates a view into the array, which may already be non-contiguous at that point. You can extract multiple values of x because c is an array type, meaning that x has clearly defined strides and shape. The semantics of the statement zeros[:, 'x'] are very unclear. For example, what happens to data_string, which has no x? I would expect an error; you might expect something else.

The only way I can see the index being simplified, is if you expand c into A directly, sort of like an anonymous structure in C, except you can't do that easily with an array.