read matlab v7.3 file into python list of numpy arrays via h5py

Well I found the solution to my problem. If anyone else has a better solution or can better explain I'd still like to hear it.

Basically, the <HDF5 object reference> needed to be used to index the h5py file object to get the underlying array that is being referenced. After we are referring to the array that is needed, it has to be loaded to memory by indexing it with [:] or any subset if only part of the array is required. Here is what I mean:

with h5py.File("f.mat") as f:
    data = [f[element[0]][:] for element in f['rank']]

and the result:

In [79]: data[0].shape
Out[79]: (50L, 53L)

In [80]: data[0].dtype
Out[80]: dtype('float64')

Hope this helps anyone in the future. I think this is the most general solution I've seen so far.


Just by way of comparison, in Octave I created and wrote:

X = cell(1,10)
for i = 1:10
   X{i}=ones(i,i)
end
save Xcell1 -hdf5 X

then in Python:

f=h5py.File('Xcell1','r')
grp=f['X']
grpv=grp['value']
X=list(grpv.items())
[x[1]['value'].value for x in X[:-1]]  # list of those 10 arrays

X[-1][1].value # (10,1) the cell array shape

or in one line

X = [f['/X/value/_0{}/value'.format(i)].value for i in range(0,10)]

With a callback function that I wrote for https://stackoverflow.com/a/27699851/901925

The file can be viewed with:

f.visititems(callback)

producing:

name: X
type: b'cell'
name: X/value/_00
type: b'scalar'
1.0
name: X/value/_01
type: b'matrix'
[[ 1.  1.]
 [ 1.  1.]]
name: X/value/_02
type: b'matrix'
[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]
name: X/value/_03
...
dims: [10  1]

Try mat73, works like charm.

pip install mat73

import mat73
data_dict = mat73.loadmat('train/digitStruct.mat')

output - enter image description here