A note on precision-preserving compression of scientific data
[摘要] Lossy compression of scientific data arrays is a powerful tool to save network bandwidth and storage space. Properly applied lossy compression can reduce the size of a dataset by orders of magnitude while keeping all essential information, whereas a wrong choice of lossy compression parameters leads to the loss of valuable data. An important class of lossy compression methods is so-called precision-preserving compression, which guarantees that a certain precision of each number will be kept. The paper considers statistical properties of several precision-preserving compression methods implemented in NetCDF Operators (NCO), a popular tool for handling and transformation of numerical data in NetCDF format. We compare artifacts resulting from the use of precision-preserving compression of floating-point data arrays. In particular, we show that a popular Bit Grooming algorithm (default in NCO until recently) has suboptimal accuracy and produces substantial artifacts in multipoint statistics. We suggest a simple implementation of two algorithms that are free from these artifacts and have double the precision. One of them can be used to rectify the data already processed with Bit Grooming. We compare precision trimming for relative and absolute precision to a popular linear packing (LP) method and find out that LP has no advantage over precision trimming at a given maximum absolute error. We give examples when LP leads to an unconstrained error in the integral characteristic of a field or leads to unphysical values. We analyze compression efficiency as a function of target precision for two synthetic datasets and discuss precision needed in several atmospheric fields. Mantissa rounding has been contributed to NCO mainstream as a replacement for Bit Grooming. The Appendix contains code samples implementing precision trimming in Python3 and Fortran 95.
[发布日期] [发布机构]
[效力级别] [学科分类] 天文学(综合)
[关键词] [时效性]