How to deal with hdf5 files in R?
Solution 1:
The rhdf5
package works really well, although it is not in CRAN. Install it from Bioconductor
# as of 2020-09-08, these are the updated instructions per
# https://bioconductor.org/install/
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.11")
And to use it:
library(rhdf5)
List the objects within the file to find the data group you want to read:
h5ls("path/to/file.h5")
Read the HDF5 data:
mydata <- h5read("path/to/file.h5", "/mygroup/mydata")
And inspect the structure:
str(mydata)
(Note that multidimensional arrays may appear transposed). Also you can read groups, which will be named lists in R.
Solution 2:
You could also use h5, a package which I recently published on CRAN.
Compared to rhdf5
it has the following features:
- S4 object model to directly interact with HDF5 objects like files, groups, datasets and attributes.
- Simpler syntax, implemented R-like subsetting operators for datasets supporting commands like
readdata <- dataset[1:3, 1:3] dataset[1:3, 1:3] <- matrix(1:9, nrow = 3)
- Supported NA values for all data types
- 200+ Test cases with a code coverage of 80%+.
To save a matrix you could use:
library(h5)
testmat <- matrix(rnorm(120), ncol = 3)
# Create HDF5 File
file <- h5file("test.h5")
# Save matrix to file in group 'testgroup' and datasetname 'testmat'
file["testgroup", "testmat"] <- testmat
# Close file
h5close(file)
... and read the entire matrix back into R:
file <- h5file("test.h5")
testmat_in <- file["testgroup", "testmat"][]
h5close(file)
See also h5 on
- CRAN: http://cran.r-project.org/web/packages/h5/index.html
- Github: https://github.com/mannau/h5
Solution 3:
I used the rgdal
package to read HDF5 files. You do need to take care that probably the binary version of rgdal
does not support hdf5
. In that case, you need to build gdal
from source with HDF5 support before building rgdal
from source.
Alternatively, try and convert the files from hdf5
to netcdf
. Once they are in netcdf, you can use the excellent ncdf
package to access the data. The conversion I think could be done with the cdo
tool.
Solution 4:
The ncdf4
package, an interface to netCDF-4, can also be used to read hdf5 files (netCDF-4 is compatible with netCDF-3, but it uses hdf5 as the storage layer).
In the developer's words:
- the HDF group says:
NetCDF-4 combines the netCDF-3 and HDF5 data models, taking the desirable characteristics of each, while taking advantage of their separate strengths
- Unidata says:
The netCDF-4 format implements and expands the netCDF-3 data model by using an enhanced version of HDF5 as the storage layer.
In practice, ncdf4
provides a simple interface, and migrating code from using older hdf5
and ncdf
packages to a single ncdf4
package has made our code less buggy and easier to write (some of my trials and workarounds are documented in my previous answer).