Damsel: A Data Model Storage Library for Exascale Science
[摘要] Computational science applications have been described as having one of seven motifs (the ???seven dwarfs???), each having a particular pattern of computation and communication. From a storage and I/O perspective, these applications can also be grouped into a number of data model motifs describing the way data is organized and accessed during simulation, analysis, and visualization. Major storage data models developed in the 1990s, such as Network Common Data Format (netCDF) and Hierarchical Data Format (HDF) projects, created support for more complex data models. Development of both netCDF and HDF5 was influenced by multi-dimensional dataset storage requirements, but their access models and formats were designed with sequential storage in mind (e.g., a POSIX I/O model). Although these and other high-level I/O libraries have had a beneficial impact on large parallel applications, they do not always attain a high percentage of peak I/O performance due to fundamental design limitations, and they do not address the full range of current and future computational science data models. The goal of this project is to enable exascale computational science applications to interact conveniently and efficiently with storage through abstractions that match their data models. The project consists of three major activities: (1) identifying major data model motifs in computational science applications and developing representative benchmarks; (2) developing a data model storage library, called Damsel, that supports these motifs, provides efficient storage data layouts, incorporates optimizations to enable exascale operation, and is tolerant to failures; and (3) productizing Damsel and working with computational scientists to encourage adoption of this library by the scientific community. The product of this project, Damsel library, is openly available for download from http://cucis.ece.northwestern.edu/projects/DAMSEL. Several case studies and application programming interface reference are also available to assist new users to learn to use the library.
[发布日期] 2014-07-11 [发布机构]
[效力级别] [学科分类] 数学(综合)
[关键词] parallel I/O;data model;I/O library;data storage [时效性]