We present ADIOS 2, the latest version of the Adaptable Input Output (I/O) System. ADIOS 2 addresses scientific data management needs ranging from scalable I/O in supercomputers, to data analysis in ...personal computer and cloud systems. Version 2 introduces a unified application programming interface (API) that enables seamless data movement through files, wide-area-networks, and direct memory access, as well as high-level APIs for data analysis. The internal architecture provides a set of reusable and extendable components for managing data presentation and transport mechanisms for new applications. ADIOS 2 bindings are available in C++11, C, Fortran, Python, and Matlab and are currently used across different scientific communities. ADIOS 2 provides a communal framework to tackle data management challenges as we approach the exascale era of supercomputing.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
The applications being developed within the U.S. Exascale Computing Project (ECP) to run on imminent Exascale computers will generate scientific results with unprecedented fidelity and record ...turn-around time. Many of these codes are based on particle-mesh methods and use advanced algorithms, especially dynamic load-balancing and mesh-refinement, to achieve high performance on Exascale machines. Yet, as such algorithms improve parallel application efficiency, they raise new challenges for I/O logic due to their irregular and dynamic data distributions. Thus, while the enormous data rates of Exascale simulations already challenge existing file system write strategies, the need for efficient read and processing of generated data introduces additional constraints on the data layout strategies that can be used when writing data to secondary storage. We review these I/O challenges and introduce two online data layout reorganization approaches for achieving good tradeoffs between read and write performance. We demonstrate the benefits of using these two approaches for the ECP particle-in-cell simulation WarpX, which serves as a motif for a large class of important Exascale applications. We show that by understanding application I/O patterns and carefully designing data layouts we can increase read performance by more than 80 percent.
A key trend facing extreme-scale computational science is the widening gap between computational and I/O rates, and the challenge that follows is how to best gain insight from simulation data when it ...is increasingly impractical to save it to persistent storage for subsequent visual exploration and analysis. One approach to this challenge is centered around the idea of in situ processing, where visualization and analysis processing is performed while data is still resident in memory. This paper examines several key design and performance issues related to the idea of in situ processing at extreme scale on modern platforms: scalability, overhead, performance measurement and analysis, comparison and contrast with a traditional post hoc approach, and interfacing with simulation codes. We illustrate these principles in practice with studies, conducted on large-scale HPC platforms, that include a miniapplication and multiple science application codes, one of which demonstrates in situ methods in use at greater than 1M-way concurrency.
The applications being developed within the U.S. Exascale Computing Project (ECP) to run on imminent Exascale computers will generate scientific results with unprecedented fidelity and record ...turn-around time. Many of these codes are based on particle-mesh methods and use advanced algorithms, especially dynamic load-balancing and mesh-refinement, to achieve high performance on Exascale machines. Yet, as such algorithms improve parallel application efficiency, they raise new challenges for I/O logic due to their irregular and dynamic data distributions. Thus, while the enormous data rates of Exascale simulations already challenge existing file system write strategies, the need for efficient read and processing of generated data introduces additional constraints on the data layout strategies that can be used when writing data to secondary storage. We review these I/O challenges and introduce two online data layout reorganization approaches for achieving good tradeoffs between read and write performance. We demonstrate the benefits of using these two approaches for the ECP particle-in-cell simulation WarpX, which serves as a motif for a large class of important Exascale applications. Here, we show that by understanding application I/O patterns and carefully designing data layouts we can increase read performance by more than 80 percent.
Abstract
Upcoming exascale applications could introduce significant data management challenges due to their large sizes, dynamic work distribution, and involvement of accelerators such as graphical ...processing units, GPUs. In this work, we explore the performance of reading and writing operations involving one such scientific application on two different supercomputers. Our tests showed that the Adaptable Input and Output System, ADIOS, was able to achieve speeds over 1TB/s, a significant fraction of the peak I/O performance on Summit. We also demonstrated the querying functionality in ADIOS could effectively support common selective data analysis operations, such as conditional histograms. In tests, this query mechanism was able to reduce the execution time by a factor of five. More importantly, ADIOS data management framework allows us to achieve these performance improvements with only a minimal amount of coding effort.
Numerous studies have shown that atmospheric models with high horizontal resolution better represent the physics and statistics of precipitation in climate models. While it is abundantly clear from ...these studies that high‐resolution increases the rate of extreme precipitation, it is not clear whether these added extreme events are “realistic”; whether they occur in simulations in response to the same forcings that drive similar events in reality. In order to understand whether increasing horizontal resolution results in improved model fidelity, a hindcast‐based, multiresolution experimental design has been conceived and implemented: the InitiaLIzed‐ensemble, Analyze, and Develop (ILIAD) framework. The ILIAD framework allows direct comparison between observed and simulated weather events across multiple resolutions and assessment of the degree to which increased resolution improves the fidelity of extremes. Analysis of 5 years of daily 5 day hindcasts with the Community Earth System Model at horizontal resolutions of 220, 110, and 28 km shows that: (1) these hindcasts reproduce the resolution‐dependent increase of extreme precipitation that has been identified in longer‐duration simulations, (2) the correspondence between simulated and observed extreme precipitation improves as resolution increases; and (3) this increase in extremes and precipitation fidelity comes entirely from resolved‐scale precipitation. Evidence is presented that this resolution‐dependent increase in precipitation intensity can be explained by the theory of Rauscher et al. (), which states that precipitation intensifies at high resolution due to an interaction between the emergent scaling (spectral) properties of the wind field and the constraint of fluid continuity.
Key Points
Hindcasts reproduce the resolution‐dependence of precipitation observed for longer simulations
The fidelity of extremes improves as resolution increases
The improvement of extremes comes entirely from resolved‐scale precipitation
Full text
Available for:
DOBA, FZAB, GIS, IJS, IZUM, KILJ, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBMB, SIK, UILJ, UKNU, UL, UM, UPUK
Scientific discoveries are increasingly relying on analysis of massive amounts of data. The ability to directly access the most relevant data records through query, without shifting through all of ...them becomes essential. However, scientific datasets are commonly stored on parallel file systems and I/O systems that are optimized for reading/writing large chunks of data, and many scientific datasets have spatial-temporal data similarity, such that the records with similar values often locate in a close proximity of each other. Therefore, our previous work started to investigate the benefit of using block range index technique for scientific datasets, which only records the value range of all the records in a data block. In this paper, we extend our work in several aspects. First, we implement and integrate our block index technique with the ADIOS I/O system. Second, we show our proposed method can be significantly better than the existing minmax and bitmaps indexing methods supported in ADIOS, and can also have comparable performance in the worst case. Third, we propose several techniques that can take advantage of the block index information to greatly reduce data retrieval time from query results. Fourth, we evaluate our approach using several real scientific datasets, and analyze the spatial-temporal data similarity characteristics in them. Through our study, we believe block index can be an effective indexing technique for scientific datasets with little implementation and operating overhead. It's size is small enough for building the indexes on-the-fly, and yet its query information is sufficient for efficient data access.
Many science applications increasingly make use of data-intensive methods that require bulk data movement such as staging of large datasets in preparation for analysis on shared computational ...resources, remote access to large data sets, and data dissemination. Over the next 5 to 10 years, these datasets are projected to grow to exabytes of data, and continued scientific progress will depend on efficient methods for data movement between high performance computing centers. We study two techniques that improve the use of available resources for large, long-running, multi-file transfers. First, we show the effect of adaptation of transfer parameters for multi-file transfers, where the adaptation is based on recent performance. Second, we use Virtual Organization and site policies to influence the allocation of resources such as available transfer streams to clients. We show that these techniques improve completion times for large multi-file data transfers by approximately 20% over resource constrained infrastructure.
On-line analytical processing (OLAP) systems based on a dimensional view of data have found widespread use in business applications and are being used increasingly in non-standard applications. These ...systems provide good performance and ease-of-use. However, the complex structures and relationships inherent in data in non-standard applications are not accommodated well by OLAP systems. In contrast, object database systems are built to handle such complexity, but do not support OLAP-type querying well.
This paper presents the concepts and techniques underlying a flexible, “multi-model” federated system that enables OLAP users to exploit simultaneously the features of OLAP and object systems. The system allows data to be handled using the most appropriate data model and technology: OLAP systems for dimensional data and object database systems for more complex, general data. This allows data analysis on the OLAP data to be significantly enriched by the use of additional object data. Additionally, physical integration of the OLAP and the object data can be avoided. As a vehicle for demonstrating the capabilities of the system, a prototypical OLAP language is defined and extended to naturally support queries that involve data in object databases. The language permits selection criteria that reference object data, queries that return combinations of OLAP and object data, and queries that group dimensional data according to object data. The system is designed to be aggregation-safe, in the sense that it exploits the aggregation semantics of the data to prevent incorrect or meaningless query results. These capabilities may also be integrated into existing languages. It is shown how to integrate relational and XML data using the technology. A prototype implementation of the system is reported, along with performance measurements that show that the approach is a viable alternative to a physically integrated data warehouse.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK