We present a multilevel technique for the compression and reduction of univariate data and give an optimal complexity algorithm for its implementation. A hierarchical scheme offers the flexibility to ...produce multiple levels of partial decompression of the data so that each user can work with a reduced representation that requires minimal storage whilst achieving the required level of tolerance. The algorithm is applied to the case of turbulence modelling in which the datasets are traditionally not only extremely large but inherently non-smooth and, as such, rather resistant to compression. We decompress the data for a range of relative errors, carry out the usual analysis procedures for turbulent data, and compare the results of the analysis on the reduced datasets to the results that would be obtained on the full dataset. The results obtained demonstrate the promise of multilevel compression techniques for the reduction of data arising from large scale simulations of complex phenomena such as turbulence modelling.
Full text
Available for:
FZAB, GEOZS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NUK, OBVAL, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
We present ADIOS 2, the latest version of the Adaptable Input Output (I/O) System. ADIOS 2 addresses scientific data management needs ranging from scalable I/O in supercomputers, to data analysis in ...personal computer and cloud systems. Version 2 introduces a unified application programming interface (API) that enables seamless data movement through files, wide-area-networks, and direct memory access, as well as high-level APIs for data analysis. The internal architecture provides a set of reusable and extendable components for managing data presentation and transport mechanisms for new applications. ADIOS 2 bindings are available in C++11, C, Fortran, Python, and Matlab and are currently used across different scientific communities. ADIOS 2 provides a communal framework to tackle data management challenges as we approach the exascale era of supercomputing.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Scientific applications continue to grow and produce extremely large amounts of data, which require efficient compression algorithms for long-term storage. Compression errors in scientific ...applications can have a deleterious impact on downstream processing. Thus, it is crucial to preserve all the “known” Quantities of Interest (QoI) during compression. To address this issue, most existing approaches guarantee the reconstruction error of the original data or primary data (PD), but cannot directly control the problem of preserving the QoI. In this work, we propose a physics-informed compression technique that is composed of two parts: (i) reduction of the PD with bounded errors and (ii) preservation of the QoI. In the first step, we combine tensor decompositions, autoencoders, product quantizers, and error-bounded lossy compressors to bound the reconstruction error at high levels of compression. In the second step, we use constraint satisfaction post-processing followed by quantization to preserve the QoI. To illustrate the challenges of reducing the reconstruction errors of the PD and QoI, we focus on simulation data generated by a large-scale fusion code, XGC, which can produce tens of petabytes in a single day. The results show that our approach can achieve a high compression amount while accurately preserving the QoI within scientifically acceptable bounds.
Storage backends of parallel compute clusters are still based mostly on magnetic disks, while newer and faster storage technologies such as flash-based SSDs or non-volatile random access memory ...(NVRAM) are deployed within compute nodes. Including these new storage technologies into scientific workflows is unfortunately today a mostly manual task, and most scientists therefore do not take advantage of the faster storage media. One approach to systematically include nodelocal SSDs or NVRAMs into scientific workflows is to deploy ad hoc file systems over a set of compute nodes, which serve as temporary storage systems for single applications or longer-running campaigns. This paper presents results from the Dagstuhl Seminar 17202 “Challenges and Opportunities of User-Level File Systems for HPC” and discusses application scenarios as well as design strategies for ad hoc file systems using node-local storage media. The discussion includes open research questions, such as how to couple ad hoc file systems with the batch scheduling environment and how to schedule stage-in and stage-out processes of data between the storage backend and the ad hoc file systems. Also presented are strategies to build ad hoc file systems by using reusable components for networking and how to improve storage device compatibility. Various interfaces and semantics are presented, for example those used by the three ad hoc file systems BeeOND, GekkoFS, and BurstFS. Their presentation covers a range from file systems running in production to cutting-edge research focusing on reaching the performance limits of the underlying devices.
This article studies the I/O write behaviors of the Titan supercomputer and its Lustre parallel file stores under production load. The results can inform the design, deployment, and configuration of ...file systems along with the design of I/O software in the application, operating system, and adaptive I/O libraries.
We propose a statistical benchmarking methodology to measure write performance across I/O configurations, hardware settings, and system conditions. Moreover, we introduce two relative measures to quantify the write-performance behaviors of hardware components under production load. In addition to designing experiments and benchmarking on Titan, we verify the experimental results on one real application and one real application I/O kernel, XGC and HACC IO, respectively. These two are representative and widely used to address the typical I/O behaviors of applications.
In summary, we find that Titan’s I/O system is variable across the machine at fine time scales. This variability has two major implications. First, stragglers lessen the benefit of coupled I/O parallelism (striping). Peak median output bandwidths are obtained with parallel writes to many independent files, with no striping or write sharing of files across clients (compute nodes). I/O parallelism is most effective when the application—or its I/O libraries—distributes the I/O load so that each target stores files for multiple clients and each client writes files on multiple targets in a balanced way with minimal contention. Second, our results suggest that the potential benefit of dynamic adaptation is limited. In particular, it is not fruitful to attempt to identify “good locations” in the machine or in the file system: component performance is driven by transient load conditions and past performance is not a useful predictor of future performance. For example, we do not observe diurnal load patterns that are predictable.