Introduction

HDF5 is a cross-platform data format used to save (high dimensional) arrays. There are various language bindings out there for manipulating HDF5 files, including C++. I record here, after stumbling around many hours, how to read data using C++.

Read scalars

// note the header is not "hdf5.h"
#include "H5Cpp.h"

int main()
{
	H5::H5File file("/path/to/data.h5", H5F_ACC_RDONLY);
	H5::DataSet dataset = file.openDataSet("dataset/path");
	H5::DataSpace filespace = dataset.getSpace();
	// it might be more than sufficient to use `1` here
	hsize_t shape[1];
	// `_dims` must be 0;
	// `shape` shouldn't be touched
	int _dims = filespace.getSimpleExtentDims(shape);
	H5::DataSpace mspace(0, shape);  // where 0 comes from `_dims`
	double buf[1];
	dataset.read(buf, H5::PredType::NATIVE_DOUBLE, mspace, filespace);

	// the scalar is in `buf[0]`

	return 0;
}

Read vector to array

#include "H5Cpp.h"

int main()
{
	H5::H5File file("/path/to/data.h5", H5F_ACC_RDONLY);
	H5::DataSet dataset = file.openDataSet("dataset/path");
	H5::DataSpace filespace = dataset.getSpace();
	// `1` corresponds to 1D array (vectors);
	// if reading 2D array (matrices), replace `1` with `2`, so forth
	hsize_t shape[1];
	// `_dims` is the actual N in N-D array; should be the same as
	// previously set; `shape` has now been set
	int _dims = filespace.getSimpleExtentDims(shape);
	H5::DataSpace mspace(1, shape); // replace `1` with `2` if like above
	double *buf = new double[shape[0]];
	// if reading 2D array the previous line should be replaced by:
	//double *buf = new double[shape[0] * shape[1]];
	// so forth
	dataset.read(buf, H5::PredType::NATIVE_DOUBLE, mspace, filespace);

	// the vector (or flatten matrix if reading matrix) is in `buf`

	delete[] buf;
	return 0;
}

Note that arrays are stored contiguously. Read arrays using something like double buf[M][N] is not allowed. See this answer.

Read vector to std::vector

Basically the same …

#include "H5Cpp.h"
#include <vector>

int main()
{
	H5::H5File file("/path/to/data.h5", H5F_ACC_RDONLY);
	H5::DataSet dataset = file.openDataSet("dataset/path");
	H5::DataSpace filespace = dataset.getSpace();
	// `1` corresponds to 1D array (vectors);
	// if reading 2D array (matrices), replace `1` with `2`, so forth
	hsize_t shape[1];
	// `_dims` is the actual N in N-D array; should be the same as
	// previously set; `shape` has now been set
	int _dims = filespace.getSimpleExtentDims(shape);
	H5::DataSpace mspace(1, shape); // replace `1` with `2` if like above
	// must preserve enough space here
	std::vector<double> buf(shape[0]);
	// likewise, previous line should be written as
	//std::vector<double> buf(shape[0] * shape[1]);
	// if reading 2D array, so forth
	// note the `.data()` here
	dataset.read(buf.data(), H5::PredType::NATIVE_DOUBLE, mspace, filespace);

	// the vector is in `buf`

	return 0;
}

Compile above code

I’m not quite sure how to compile on Windows, but for Linux and macOS, Makefile should be written like this.

LDFLAGS = \
	-L/path/to/hdf5/incstall/directory/lib
# note the library names here; only `-lhdf5` is not enough
LDLIBS = \
	-lhdf5 \
	-lhdf5_cpp \
	-lhdf5_hl_cpp
CPPFLAGS = \
	-I/path/to/hdf5/install/directory/include
CXX = clang++

# I haven't tried what if `-std=c++11` is not added, but I guess it
# should be okay
a.out : source.cpp
	$(CXX) $(CPPFLAGS) $(LDFLAGS) -std=c++11 -o $@ $^ $(LDLIBS)