Many-core chips are changing the way high-performance computing systems are built and programmed. As it is becoming increasingly difficult to maintain cache coherence across many cores, manufacturers ...are exploring designs that do not feature any cache coherence between cores. Communications on such chips are naturally implemented using message passing, which makes them resemble clusters, but with an important difference. Special hardware can be provided that supports very fast on-chip communications, reducing latency and increasing bandwidth. We present one such chip, the Single-Chip Cloud Computer (SCC). This is an experimental processor, created by Intel Labs. We describe two communication libraries available on SCC: RCCE and Rckmb. RCCE is a light-weight, minimal library for writing message passing parallel applications. Rckmb provides the data link layer for running network services such as TCP/IP. Both utilize SCC's non-cache-coherent shared memory for transferring data between cores without needing to go off-chip. In this paper we describe the design and implementation of RCCE and Rckmb. To compare their performance, we consider simple benchmarks run with RCCE, and MPI over TCP/IP.
Using the Parallel Research Kernels to Study PGAS Models Van Der Wijngaart, Rob F.; Sridharan, Srinivas; Kayi, Abdullah ...
2015 9th International Conference on Partitioned Global Address Space Programming Models,
09/2015
Conference Proceeding
A subset of the Parallel Research Kernels (PRK),simplified parallel application patterns, are used to study the behavior of different runtimes implementing the PGAS programming model. The goal of ...this paper is to show that such an approach is practical and effective as we approach the exascale era. Our experimental results indicate that forthe kernels we selected, MPI with two-sided communications outperforms the PGAS runtimes SHMEM, UPC, Grappa, and MPI-3 with RMA extensions.
In high-performance computing (HPC), the demand for efficient parallel programming models has grown dramatically since the end of Dennard Scaling and the subsequent move to multi-core CPUs. OpenMP ...stands out as a popular choice due to its simplicity and portability, offering a directive-driven approach for shared-memory parallel programming. Despite its wide adoption, however, there is a lack of comprehensive data on the actual usage of OpenMP constructs, hindering unbiased insights into its popularity and evolution. This paper presents a statistical analysis of OpenMP usage and adoption trends based on a novel and extensive database, HPCORPUS, compiled from GitHub repositories containing C, C++, and Fortran code. The results reveal that OpenMP is the dominant parallel programming model, accounting for 45% of all analyzed parallel APIs. Furthermore, it has demonstrated steady and continuous growth in popularity over the past decade. Analyzing specific OpenMP constructs, the study provides in-depth insights into their usage patterns and preferences across the three languages. Notably, we found that while OpenMP has a strong "common core" of constructs in common usage (while the rest of the API is less used), there are new adoption trends as well, such as simd and target directives for accelerated computing and task for irregular parallelism. Overall, this study sheds light on OpenMP's significance in HPC applications and provides valuable data for researchers and practitioners. It showcases OpenMP's versatility, evolving adoption, and relevance in contemporary parallel programming, underlining its continued role in HPC applications and beyond. These statistical insights are essential for making informed decisions about parallelization strategies and provide a foundation for further advancements in parallel programming models and techniques. HPCORPUS, as well as the analysis scripts and raw results, are available at: https://github.com/Scientific-Computing-Lab-NRCN/HPCorpus
Quantum computing represents a paradigm shift for computation requiring an entirely new computer architecture. However, there is much that can be learned from traditional classical computer ...engineering. In this paper, we describe the Parallel Research Kernels (PRK), a tool that was very useful for designing classical parallel computing systems. The PRK are simple kernels written to expose bottlenecks that limit classical parallel computing performance. We hypothesize that an analogous tool for quantum computing, Quantum Research Kernels (QRK), may similarly aid the co-design of software and hardware for quantum computing systems, and we give a few examples of representative QRKs.
Computational science has made rapid progress in recent years, leading to ever increasing demand for supercomputing resources. For scientific applications that leverage such resources, Message ...Passing Interface (MPI) plays a crucial role in enabling distributed memory parallelization across multiple nodes. However, parallelizing MPI code manually, and specifically, performing domain decomposition, is a challenging and error-prone task.
In this paper, we address this problem by developing MPI-rical, a novel data-driven, programming-assistance tool that assists programmers in writing domain decomposition based distributed memory parallelization code using MPI. Specifically, we leverage Transformer architecture — the invention that led to advancements in the field of natural language processing (NLP) — with a supervised language model to suggest MPI functions and their proper locations in the code on the fly. In addition to the novel model for MPI-based parallel programming, in this paper, we also introduce MPICodeCorpus, the first publicly-available corpus of MPI-based parallel programs that is created by mining more than 15,000 open-source repositories on GitHub. Experimental results demonstrate the effectiveness of MPI-rical on both dataset from MPICodeCorpus and more importantly, on a compiled benchmark of MPI-based parallel programs for numerical computations that represent real-world scientific applications. Specifically, MPI-rical achieves F1 scores between 0.87-0.91 on these programs, demonstrating its accuracy in suggesting correct MPI functions at appropriate code locations. The source code used in this work, as well as other relevant sources, are available at: https://github.com/Scientific-Computing-Lab-NRCN/MPI-rical.
How Good is OpenMP Mattson, Timothy G.
Scientific programming,
2003, Letnik:
11, Številka:
2
Journal Article
Recenzirano
Odprti dostop
The OpenMP standard defines an Application Programming Interface (API) for shared memory computers. Since its introduction in 1997, it has grown to become one of the most commonly used API's for ...parallel programming. But success in the market doesn't necessarily imply successful computer science. Is OpenMP a "good" programming environment? What does it even mean to call a programming environment good? And finally, once we understand how good or bad OpenMP is; what can we do to make it even better? In this paper, we will address these questions.