How to deal with hdf5 files in R?

Solution 1:

The rhdf5 package works really well, although it is not in CRAN. Install it from Bioconductor

# as of 2020-09-08, these are the updated instructions per
# https://bioconductor.org/install/

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install(version = "3.11")

And to use it:

library(rhdf5)

List the objects within the file to find the data group you want to read:

h5ls("path/to/file.h5")

Read the HDF5 data:

mydata <- h5read("path/to/file.h5", "/mygroup/mydata")

And inspect the structure:

str(mydata)

(Note that multidimensional arrays may appear transposed). Also you can read groups, which will be named lists in R.

Solution 2:

You could also use h5, a package which I recently published on CRAN. Compared to rhdf5 it has the following features:

  1. S4 object model to directly interact with HDF5 objects like files, groups, datasets and attributes.
  2. Simpler syntax, implemented R-like subsetting operators for datasets supporting commands like readdata <- dataset[1:3, 1:3] dataset[1:3, 1:3] <- matrix(1:9, nrow = 3)
  3. Supported NA values for all data types
  4. 200+ Test cases with a code coverage of 80%+.

To save a matrix you could use:

library(h5)
testmat <- matrix(rnorm(120), ncol = 3)
# Create HDF5 File
file <- h5file("test.h5")
# Save matrix to file in group 'testgroup' and datasetname 'testmat'
file["testgroup", "testmat"] <- testmat
# Close file
h5close(file)

... and read the entire matrix back into R:

file <- h5file("test.h5")
testmat_in <- file["testgroup", "testmat"][]
h5close(file)

See also h5 on

  • CRAN: http://cran.r-project.org/web/packages/h5/index.html
  • Github: https://github.com/mannau/h5

Solution 3:

I used the rgdal package to read HDF5 files. You do need to take care that probably the binary version of rgdal does not support hdf5. In that case, you need to build gdal from source with HDF5 support before building rgdal from source.

Alternatively, try and convert the files from hdf5 to netcdf. Once they are in netcdf, you can use the excellent ncdf package to access the data. The conversion I think could be done with the cdo tool.

Solution 4:

The ncdf4 package, an interface to netCDF-4, can also be used to read hdf5 files (netCDF-4 is compatible with netCDF-3, but it uses hdf5 as the storage layer).

In the developer's words:

  • the HDF group says:

NetCDF-4 combines the netCDF-3 and HDF5 data models, taking the desirable characteristics of each, while taking advantage of their separate strengths

  • Unidata says:

The netCDF-4 format implements and expands the netCDF-3 data model by using an enhanced version of HDF5 as the storage layer.

In practice, ncdf4 provides a simple interface, and migrating code from using older hdf5 and ncdf packages to a single ncdf4 package has made our code less buggy and easier to write (some of my trials and workarounds are documented in my previous answer).