The four-volume set LNCS 13350, 13351, 13352, and 13353 constitutes the proceedings of the 22ndt International Conference on Computational Science, ICCS 2022, held in London, UK, in June 2022.* The ...total of 175 full papers and 78 short papers presented in this book set were carefully reviewed and selected from 474 submissions. 169 full and 36 short papers were accepted to the main track; 120 full and 42 short papers were accepted to the workshops/ thematic tracks. *The conference was held in a hybrid format This is an open access book.
Fail-stop failures in distributed environments are often tolerated by checkpointing or message logging. In this paper, we show that fail-stop process failures in ScaLAPACK matrix-matrix ...multiplication kennel can be tolerated without checkpointing or message logging. It has been proved in previous algorithm-based fault tolerance that, for matrix-matrix multiplication, the checksum relationship in the input checksum matrices is preserved at the end of the computation no mater which algorithm is chosen. From this checksum relationship in the final computation results, processor miscalculations can be detected, located, and corrected at the end of the computation. However, whether this checksum relationship can be maintained in the middle of the computation or not remains open. In this paper, we first demonstrate that, for many matrix matrix multiplication algorithms, the checksum relationship in the input checksum matrices is not maintained in the middle of the computation. We then prove that, however, for the outer product version algorithm, the checksum relationship in the input checksum matrices can be maintained in the middle of the computation. Based on this checksum relationship maintained in the middle of the computation, we demonstrate that fail-stop process failures (which are often tolerated by checkpointing or message logging) in ScaLAPACK matrix-matrix multiplication can be tolerated without checkpointing or message logging.
New high-performance computing system designs with steeply escalating processor and core counts, burgeoning heterogeneity and accelerators, and increasingly unpredictable memory access times call for ...one or more dramatically new programming paradigms. These new approaches must react and adapt quickly to unexpected contentions and delays, and they must provide the execution environment with sufficient intelligence and flexibility to rearrange the execution to improve resource utilization. The authors present an approach based on task parallelism that reveals the application's parallelism by expressing its algorithm as a task flow. This strategy allows the algorithm to be decoupled from the data distribution and the underlying hardware, since the algorithm is entirely expressed as flows of data. This kind of layering provides a clear separation of concerns among architecture, algorithm, and data distribution. Developers benefit from this separation because they can focus solely on the algorithmic level without the constraints involved with programming for current and future hardware trends.
Big data and extreme-scale computing Asch, M; Moore, T; Badia, R ...
The international journal of high performance computing applications,
07/2018, Letnik:
32, Številka:
4
Journal Article
Recenzirano
Over the past four years, the Big Data and Exascale Computing (BDEC) project organized a series of five international workshops that aimed to explore the ways in which the new forms of data-centric ...discovery introduced by the ongoing revolution in high-end data analysis (HDA) might be integrated with the established, simulation-centric paradigm of the high-performance computing (HPC) community. Based on those meetings, we argue that the rapid proliferation of digital data generators, the unprecedented growth in the volume and diversity of the data they generate, and the intense evolution of the methods for analyzing and using that data are radically reshaping the landscape of scientific computing. The most critical problems involve the logistics of wide-area, multistage workflows that will move back and forth across the computing continuum, between the multitude of distributed sensors, instruments and other devices at the networks edge, and the centralized resources of commercial clouds and HPC centers. We suggest that the prospects for the future integration of technological infrastructures and research ecosystems need to be considered at three different levels. First, we discuss the convergence of research applications and workflows that establish a research paradigm that combines both HPC and HDA, where ongoing progress is already motivating efforts at the other two levels. Second, we offer an account of some of the problems involved with creating a converged infrastructure for peripheral environments, that is, a shared infrastructure that can be deployed throughout the network in a scalable manner to meet the highly diverse requirements for processing, communication, and buffering/storage of massive data workflows of many different scientific domains. Third, we focus on some opportunities for software ecosystem convergence in big, logically centralized facilities that execute large-scale simulations and models and/or perform large-scale data analytics. We close by offering some conclusions and recommendations for future investment and policy review.
Current climate models have a limited ability to increase spatial resolution because numerical stability requires the time step to decrease. We describe a semi-Lagrangian method for tracer transport ...that is stable for arbitrary Courant numbers, and we test a parallel implementation discretized on the cubed sphere. The method includes a fixer that conserves mass and constrains tracers to a physical range of values. The method shows third-order convergence and maintains nonlinear tracer correlations to second order. It shows optimal accuracy at Courant numbers of 10–20, more than an order of magnitude higher than explicit methods. We present parallel performance in terms of strong scaling, weak scaling, and spatial scaling (where the time step stays constant while the resolution increases). For a 0.2° test with 100 tracers, the implementation scales efficiently to 10,000 MPI tasks.
This paper describes the automatically tuned linear algebra software (ATLAS) project, as well as the fundamental principles that underly it. ATLAS is an instantiation of a new paradigm in high ...performance library production and maintenance, which we term automated empirical optimization of software (AEOS); this style of library management has been created in order to allow software to keep pace with the incredible rate of hardware advancement inherent in Moore's Law. ATLAS is the application of this new paradigm to linear algebra software, with the present emphasis on the basic linear algebra subprograms (BLAS), a widely used, performance-critical, linear algebra kernel library.
One of the main obstacles to the efficient solution of scientific problems is the problem of tuning software, both to the available architecture and to the user problem at hand. We describe ...approaches for obtaining tuned high-performance kernels and for automatically choosing suitable algorithms. Specifically, we describe the generation of dense and sparse Basic Linear Algebra Subprograms (BLAS) kernels, and the selection of linear solver algorithms. However, the ideas presented here extend beyond these areas, which can be considered proof of concept.