AvogadroLibs 1.97.0
Public Types | Public Member Functions | List of all members
Hdf5DataFormat Class Reference

The Hdf5DataFormat class provides access to data stored in HDF5 files. More...

#include <avogadro/io/hdf5dataformat.h>

Public Types

enum  OpenMode {
  ReadOnly = 0 ,
  ReadWriteTruncate ,
  ReadWriteAppend
}
 

Public Member Functions

 ~Hdf5DataFormat ()
 
bool isOpen () const
 
bool openFile (const std::string &filename_, OpenMode mode=ReadWriteAppend)
 openFile Open a file for use by this reader/writer. More...
 
std::string filename () const
 
bool closeFile ()
 closeFile Close the file and reset the reader/writer. Another file may be opened after calling this function. More...
 
void setThreshold (size_t bytes)
 setThreshold Set the threshold size in bytes that will be used in the exceedsThreshold functions. The threshold can be used to determine which data is considered "large enough" to be stored in HDF5, rather than an accompanying format. More...
 
size_t threshold () const
 
bool exceedsThreshold (size_t bytes) const
 exceedsThreshold Test if a data set is "large enough" to be stored in HDF5 format. If this function returns true, the number of bytes tested is larger than the threshold and the data should be written into the HDF5 file. If false, the data should be written into the accompanying format. More...
 
bool exceedsThreshold (const MatrixX &data) const
 exceedsThreshold Test if a data set is "large enough" to be stored in HDF5 format. If this function returns true, the size of the data in the object is larger than the threshold and should be written into the HDF5 file. If false, the data should be written into the accompanying format. More...
 
bool exceedsThreshold (const std::vector< double > &data) const
 exceedsThreshold Test if a data set is "large enough" to be stored in HDF5 format. If this function returns true, the size of the data in the object is larger than the threshold and should be written into the HDF5 file. If false, the data should be written into the accompanying format. More...
 
bool exceedsThreshold (const Core::Array< double > &data) const
 exceedsThreshold Test if a data set is "large enough" to be stored in HDF5 format. If this function returns true, the size of the data in the object is larger than the threshold and should be written into the HDF5 file. If false, the data should be written into the accompanying format. More...
 
bool datasetExists (const std::string &path) const
 datasetExists Test if the currently open file contains a dataset at the HDF5 absolute path path. More...
 
bool removeDataset (const std::string &path) const
 removeDataset Remove a dataset from the currently opened file. More...
 
std::vector< int > datasetDimensions (const std::string &path) const
 datasetDimensions Find the dimensions of a dataset. More...
 
bool writeDataset (const std::string &path, const MatrixX &data) const
 writeDataset Write the data to the currently opened file at the specified absolute HDF5 path. More...
 
bool writeDataset (const std::string &path, const std::vector< double > &data, int ndims=1, size_t *dims=nullptr) const
 writeDataset Write the data to the currently opened file at the specified absolute HDF5 path. More...
 
bool writeDataset (const std::string &path, const Core::Array< double > &data, int ndims=1, size_t *dims=nullptr) const
 writeDataset Write the data to the currently opened file at the specified absolute HDF5 path. More...
 
bool readDataset (const std::string &path, MatrixX &data) const
 readDataset Populate the data container @data with data at from the specified path in the currently opened HDF5 file. More...
 
std::vector< int > readDataset (const std::string &path, std::vector< double > &data) const
 readDataset Populate the data container @data with data at from the specified path in the currently opened HDF5 file. More...
 
std::vector< int > readDataset (const std::string &path, Core::Array< double > &data) const
 readDataset Populate the data container @data with data at from the specified path in the currently opened HDF5 file. More...
 
std::vector< std::string > datasets () const
 datasets Traverse the currently opened file and return a list of all dataset objects in the file. More...
 

Detailed Description

Author
Allison Vacanti

This class is intended to supplement an existing format reader/writer by providing the option to write large data to an HDF5 file store. The purpose is to keep text format files at a managable size.

To use this class, open or create an HDF5 file with the openFile method, using the appropriate OpenMode for the intended operation. Data can be written to the file using the writeDataset methods and retrieved using the readDataset methods. When finished, call closeFile to release the file resources from the HDF5 library.

A complete set of datasets available in an open file can be retrieved with the datasets() method, and the existence of a particular dataset can be tested with datasetExists(). removeDataset() can be used to unlink an existing dataset from the file, though this will not free any space on disk. The space occupied by an unlinked dataset may be reclaimed by new write operations, but only if they occur before the file is closed.

A convenient thresholding system is implemented to help the accompanying text format writer determine which data is "large" enough to be stored in HDF5. A size threshold (in bytes) may be set with the setThreshold() function (the default is 1KB). A data object may be passed to the exceedsThreshold method to see if the size of the data in the container exceeds the currently set threshold. If so, it should be written into the HDF5 file by writeDataset. If not, it should be serialized into the text file in a suitable format. The thresholding operations are optional; the threshold size does not affect the behavior of the read/write methods and are only for user convenience.

Member Enumeration Documentation

◆ OpenMode

enum OpenMode

Open modes for use with openFile().

Enumerator
ReadOnly 

Open an existing file in read-only mode. The file must exist.

ReadWriteTruncate 

Create a file in read/write mode, removing any existing file with the same name.

ReadWriteAppend 

Open an file in read/write mode. If the file exist, its contents will be preserved. If it does not, a new file will be created.

Constructor & Destructor Documentation

◆ ~Hdf5DataFormat()

Destructor. Closes any open file before freeing memory.

Member Function Documentation

◆ isOpen()

bool isOpen ( ) const
Returns
true if a file is open.

◆ openFile()

bool openFile ( const std::string &  filename_,
OpenMode  mode = ReadWriteAppend 
)
Parameters
filename_Name of the file to open.
modeOpenMode for the file. Default is ReadWriteAppend.
Note
Only a single file may be opened at a time. Attempting to open multiple files without calling closeFile() will fail.
Returns
True if the file is successfully opened/create by the HDF5 subsystem, false otherwise.

◆ filename()

std::string filename ( ) const
Returns
The name of the open file, or an empty string if no file is open.

◆ closeFile()

bool closeFile ( )
Returns
true if the file is successfully released by the HDF5 subsystem.

◆ setThreshold()

void setThreshold ( size_t  bytes)
Parameters
bytesThe size in bytes for the threshold. Default: 1KB.

◆ threshold()

size_t threshold ( ) const
Returns
The current threshold size in bytes. Default: 1KB.

◆ exceedsThreshold() [1/4]

bool exceedsThreshold ( size_t  bytes) const
Parameters
bytesThe size of the dataset in bytes
Returns
true if the size exceeds the threshold set by setThreshold.

◆ exceedsThreshold() [2/4]

bool exceedsThreshold ( const MatrixX &  data) const
Parameters
dataData object to test.
Returns
true if the size of the serializable data in data exceeds the threshold set by setThreshold.

◆ exceedsThreshold() [3/4]

bool exceedsThreshold ( const std::vector< double > &  data) const
Parameters
dataData object to test.
Returns
true if the size of the serializable data in data exceeds the threshold set by setThreshold.

◆ exceedsThreshold() [4/4]

bool exceedsThreshold ( const Core::Array< double > &  data) const
Parameters
dataData object to test.
Returns
true if the size of the serializable data in data exceeds the threshold set by setThreshold.

◆ datasetExists()

bool datasetExists ( const std::string &  path) const
Parameters
pathAn absolute path into the HDF5 data.
Returns
true if the object at path both exists and is a dataset, false otherwise.

◆ removeDataset()

bool removeDataset ( const std::string &  path) const
Parameters
pathAn absolute path into the HDF5 data.
Returns
true if the dataset exists and has been successfully removed.
Warning
Removing datasets can be expensive in terms of filesize, as deleted space cannot be reclaimed by HDF5 once the file is closed, and the file will not decrease in size as datasets are removed. For details, see http://www.hdfgroup.org/HDF5/doc/H5.user/Performance.html#Freespace.

◆ datasetDimensions()

std::vector< int > datasetDimensions ( const std::string &  path) const
Parameters
pathAn absolute path into the HDF5 data.
Returns
A vector containing the dimensionality of the data, major dimension first. If an error is encountered, an empty vector is returned.

◆ writeDataset() [1/3]

bool writeDataset ( const std::string &  path,
const MatrixX &  data 
) const
Parameters
pathAn absolute path into the HDF5 data.
dataThe data container to serialize to HDF5.
Returns
true if the data is successfully written, false otherwise.

◆ writeDataset() [2/3]

bool writeDataset ( const std::string &  path,
const std::vector< double > &  data,
int  ndims = 1,
size_t *  dims = nullptr 
) const
Parameters
pathAn absolute path into the HDF5 data.
dataThe data container to serialize to HDF5.
ndimsThe number of dimensions in the data. Default: 1.
dimsThe dimensionality of the data, major dimension first. Default: data.size().
Note
Since std::vector is a flat container, the dimensionality data is only used to set up the dataset metadata in the HDF5 container. Omitting the dimensionality parameters will write a flat array.
Returns
true if the data is successfully written, false otherwise.

◆ writeDataset() [3/3]

bool writeDataset ( const std::string &  path,
const Core::Array< double > &  data,
int  ndims = 1,
size_t *  dims = nullptr 
) const
Parameters
pathAn absolute path into the HDF5 data.
dataThe data container to serialize to HDF5.
ndimsThe number of dimensions in the data. Default: 1.
dimsThe dimensionality of the data, major dimension first. Default: data.size().
Note
Since this is a flat container, the dimensionality data is only used to set up the dataset metadata in the HDF5 container. Omitting the dimensionality parameters will write a flat array.
Returns
true if the data is successfully written, false otherwise.

◆ readDataset() [1/3]

bool readDataset ( const std::string &  path,
MatrixX &  data 
) const
Parameters
pathAn absolute path into the HDF5 data.
dataThe data container to into which the HDF5 data shall be deserialized. data will be resized to fit the data.
Returns
true if the data is successfully read, false otherwise. If the read fails, the data object may be left in an unpredictable state.

◆ readDataset() [2/3]

std::vector< int > readDataset ( const std::string &  path,
std::vector< double > &  data 
) const
Parameters
pathAn absolute path into the HDF5 data.
dataThe data container to into which the HDF5 data shall be deserialized. data will be resized to fit the data.
Returns
A vector containing the dimensionality of the dataset, major dimension first. If an error occurs, an empty vector is returned and *data will be set to nullptr.

◆ readDataset() [3/3]

std::vector< int > readDataset ( const std::string &  path,
Core::Array< double > &  data 
) const
Parameters
pathAn absolute path into the HDF5 data.
dataThe data container to into which the HDF5 data shall be deserialized. data will be resized to fit the data.
Returns
A vector containing the dimensionality of the dataset, major dimension first. If an error occurs, an empty vector is returned and *data will be set to nullptr.

◆ datasets()

std::vector< std::string > datasets ( ) const
Returns
A list of datasets in the current file.
Warning
The list is not cached internal and is recalculated on each call. This may be expensive on large HDF5 files, so external caching is recommended if this data is frequently needed.

The documentation for this class was generated from the following file: