Skip to content
Matthieu Haefele edited this page Mar 14, 2019 · 2 revisions

Parallel HDF5

Parallel HPC systems covered here:

  • Shared memory.
  • Distributed memory.

In the context of this course MPI will be used to handle communication between threads.

Four classes of communications:

  • Collective - blocking
  • Collective - non blocking
  • Point-to-Point - blocking
  • Point-to-Point - non blocking

Parallel file systems

Separation between Meta-data and file blocks. As one file can be sorted across multiple drives, the bandwidth of all the drives can be leveraged to write the file in a parallel (and therefore faster) way. This is know as striping the file, i.e. the file is divided into strips. The application will only see a serial file system however.

An example:

Let us consider a 2D structured array, the array is of size $S \times S$, a block-block distribution is used with $P=p_xp_y$ cores

A single distributed data is separated into different files, with the method of spreading out depending on the number MPI-processes. i.e. one file per MPI rank. This can lead to greater post-processing and a large number of files. It is however easy to implement.


MPI-IO

MPI implementation takes care of actually writing a single contiguous file on disk from the distributed data. Result is identical as the gather + POSIX file. MPI-IO performs the gather operation within the MPI implementation.

The advantages are:

  • No more memory limitation
  • Single resulting file.

The disadvantages are:

  • Definition of MPI derived types
  • Performance linked to MPI library
Positioning Synchronism Coordination
Non collective Collective
Explicit offsets Blocking MPI_FILE_READ_AT MPI_FILE_WRITE_AT MPI_FILE_READ_AT_ALL MPI_FILE_WRITE_AT_ALL
Non blocking & Split call MPI_FILE_IREAD_AT MPI_FILE_IWRITE_AT MPI_FILE_READ_AT_ALL_BEGIN MPI_FILE_READ_AT_ALL_END MPI_FILE_WRITE_AT_ALL_BEGIN MPI_FILE_WRITE_AT_ALL_END
Individual file pointers Blocking MPI_FILE_READ MPI_FILE_WRITE MPI_FILE_READ_ALL MPI_FILE_WRITE_ALL
Non blocking & Split call MPI_FILE_IREAD MPI_FILE_IWRITE MPI_FILE_READ_ALL_BEGIN MPI_FILE_READ_ALL_END MPI_FILE_WRITE_ALL_BEGIN MPI_FILE_WRITE_ALL_END
Shared file pointers Blocking MPI_FILE_READ_SHARED MPI_FILE_WRITE_SHARED MPI_FILE_READ_ORDERED MPI_FILE_WRITE_ORDERED
Non blocking & Split call MPI_FILE_IREAD_SHARED MPI_FILE_IWRITE_SHARED MPI_FILE_READ_ORDERED_BEGIN MPI_FILE_READ_ORDERED_END MPI_FILE_WRITE_ORDERED_BEGIN MPI_FILE_WRITE_ORDERED_END

Parallel HDF5

Parallel hdf5 attempts to automate this process for you, in a pain-free mannar.

The main properties are:

  • Built on top of MPI-IO
  • Data distribution is described thanks to HDF5 hyper-slices
  • Result is a single portable HDF5 file

Pros:

  • Easy to develop
  • Single portable file

Cons:

  • Maybe some performance issues

Examples of IO technologies with typical associated use case

Scientific results / diagnostics:

  • Multiple POSIX files in ASCII or binary MPI-IO
  • pHDF5
  • XIOS

Restart files:

  • SIONlib
  • ADIOS
Abstraction API Purpose Hardware Format Single/multi File Online Post-processing
POSIX Stream Imperative General No Binary Multi No
MPI-IO Stream Imperative General No Binary Single No
pHDF5 Object Imperative General No HDF5 Single/Multi No
XIOS Object Declarative General No NetCDF/HDF5 Single Yes
SIONlib Stream Imperative General No Binary Multi++ No
ADIOS Object Decl./Imp General Yes NetCDF/HDF5 Single/Multi Yes
FTI Object Declarative Specific Yes Binary N.A No