SPDP Compression Filter 1.1 for HDF5


SPDP is a fast, lossless, unified compression/decompression filter for HDF5 that has been designed for both 32-bit single-precision (float) and 64-bit double-precision (double) floating-point data. It also works on other data. A standalone compressor and description of the SPDP algorithm are available here.

The following steps illustrate how to install and use the SPDP filter. These steps have only been tested in a GNU/Linux environment. Note that SPDP is protected by the license included in the beginning of the code.

  1. Download H5Zspdp.c and the makefile.
  2. Edit the HDF5_INSTALL location in the makefile to point to the base directory of your hdf5 installation. If you are using a parallel hdf5 installation with MPI, then also change MPI_INSTALL to point to the base directory of your MPI installation. If not, then just leave MPI_INSTALL blank as is.
  3. Make the library by typing make.
  4. You should now have a shared object file named libH5Zspdp.so.1.0. The simplest way to make the library visible to hdf5 is to set the HDF5_PLUGIN_PATH environment variable to point to the directory containing the SPDP library that you just made. This can be done by executing the following command while in the directory containing libH5Zspdp.so.1.0:

  5. export HDF5_PLUGIN_PATH=$PWD

  6. You do not need to include any additional files in your hdf5 application code in order to use SPDP. But you do need the SPDP identification number when calling H5Pset_filter(). The filter ID for SPDP is 32009. This number is how you reference SPDP from your application code. hdf5 will automatically search for shared objects in $HDF5_PLUGIN_PATH to see if the library with the correct ID can be found there.
  7. To set the SPDP filter on a dataset, all you have to do in your application code is call H5Pset_filter(). Here is the signature of the function.

  8. herr_t H5Pset_filter(hid_t plist_id, H5Z_filter_t filter_id, unsigned int flags, size_t cd_nelmts, const unsigned int cd_values[])

You need five parameters to set a filter. The first is the property list of the dataset that you want to apply the filter to. Then you need to pass the filter ID of the filter you want to apply. To use SPDP, you would pass the literal value of 32009. Next you need to pass an integer that specifies whether this filter is optional or not. Since you must have the SPDP filter when you decompress data, you should pass the pre-defined macro H5Z_FLAG_MANDATORY to indicate that the filter is required for decoding. Then the last two arguments are passed to the SPDP algorithm itself. For SPDP, there is only one argument that you can pass to the filter, which is an integer from 0 to 9 that represents the desired level of compression (9 yields the highest compression ratio and slowest runtime). Since there is only one argument, cd_nelmts should be the literal value 1 and cd_values should be an array of only one element containing the level.

Here is an example call to set the SPDP filter to a dataset with a property list dcpl_id.

int SPDP_args[] = {4}; //compression level of 4
herr_t status = H5Pset_filter(dcpl_id, (H5Z_filter_t)32009, H5Z_FLAG_MANDATORY, 1, SPDP_args);

If this call is successful, the returned status should be zero. You can now write to your datasets like you normally would. No extra consideration is needed when reading from a dataset that is compressed with SPDP (besides having the library available to hdf5). In other words, you don't have to call the H5Pset_filter function to read a dataset.

Additional Considerations:

  1. Your datasets must be chunked in order to use any filter. For SPDP and most filters, the recommended size of each chunk is around a megabyte. Avoid small chunk sizes below a few kilobytes since the extra meta information required will reduce the compression ratio and hurt performance. Very small chunk sizes will actually use a huge amount of memory and bring your system to a crawl.
  2. If you have existing .h5 files that you want to compress using SPDP, you can use the h5repack utility to do so. For SPDP, you would do something like the following.

  3. h5repack -f UD=32009,1,4 file_to_compress.h5 target_name.h5

You can change the 4 to any number between 0 and 9. Again, the filter needs to be visible to hdf5 for this to work.

Extra Material:

Below are links to a couple documents you may want to read to familiarize yourself with third-party filters.

https://www.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf
https://www.hdfgroup.org/HDF5/faq/compression.html
Official Texas State University Disclaimer