How to read HDF5 files in Python
I am trying to read data from hdf5 file in Python. I can read the hdf5 file using h5py
, but I cannot figure out how to access data within the file.
My code
import h5py
import numpy as np
f1 = h5py.File(file_name,'r+')
This works and the file is read. But how can I access data inside the file object f1
?
Solution 1:
Read HDF5
import h5py
filename = "file.hdf5"
with h5py.File(filename, "r") as f:
# List all groups
print("Keys: %s" % f.keys())
a_group_key = list(f.keys())[0]
# Get the data
data = list(f[a_group_key])
Write HDF5
import h5py
# Create random data
import numpy as np
data_matrix = np.random.uniform(-1, 1, size=(10, 3))
# Write data to HDF5
with h5py.File("file.hdf5", "w") as data_file:
data_file.create_dataset("group_name", data=data_matrix)
See h5py docs for more information.
Alternatives
- JSON: Nice for writing human-readable data; VERY commonly used (read & write)
- CSV: Super simple format (read & write)
- pickle: A Python serialization format (read & write)
- MessagePack (Python package): More compact representation (read & write)
- HDF5 (Python package): Nice for matrices (read & write)
- XML: exists too *sigh* (read & write)
For your application, the following might be important:
- Support by other programming languages
- Reading / writing performance
- Compactness (file size)
See also: Comparison of data serialization formats
In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python
Solution 2:
Reading the file
import h5py
f = h5py.File(file_name, mode)
Studying the structure of the file by printing what HDF5 groups are present
for key in f.keys():
print(key) #Names of the groups in HDF5 file.
Extracting the data
#Get the HDF5 group
group = f[key]
#Checkout what keys are inside that group.
for key in group.keys():
print(key)
data = group[some_key_inside_the_group][()]
#Do whatever you want with data
#After you are done
f.close()
Solution 3:
you can use Pandas.
import pandas as pd
pd.read_hdf(filename,key)
Solution 4:
Here's a simple function I just wrote which reads a .hdf5 file generated by the save_weights function in keras and returns a dict with layer names and weights:
def read_hdf5(path):
weights = {}
keys = []
with h5py.File(path, 'r') as f: # open file
f.visit(keys.append) # append all keys to list
for key in keys:
if ':' in key: # contains data if ':' in key
print(f[key].name)
weights[f[key].name] = f[key].value
return weights
https://gist.github.com/Attila94/fb917e03b04035f3737cc8860d9e9f9b.
Haven't tested it thoroughly but does the job for me.