The Ongoing Evolution of OpenMP de Supinski, Bronis R.; Scogland, Thomas R. W.; Duran, Alejandro ...
Proceedings of the IEEE,
11/2018, Letnik:
106, Številka:
11
Journal Article
Recenzirano
Odprti dostop
This paper presents an overview of the past, present and future of the OpenMP application programming interface (API). While the API originally specified a small set of directives that guided shared ...memory fork-join parallelization of loops and program sections, OpenMP now provides a richer set of directives that capture a wide range of parallelization strategies that are not strictly limited to shared memory. As we look toward the future of OpenMP, we immediately see further evolution of the support for that range of parallelization strategies and the addition of direct support for debugging and performance analysis tools. Looking beyond the next major release of the specification of the OpenMP API, we expect the specification eventually to include support for more parallelization strategies and to embrace closer integration into its Fortran, C and in particular, C++ base languages, which will likely require the API to adopt additional programming abstractions.
The TileDB array data storage manager Papadopoulos, Stavros; Datta, Kushal; Madden, Samuel ...
Proceedings of the VLDB Endowment,
12/2016, Letnik:
10, Številka:
4
Journal Article
Recenzirano
We present a novel storage manager for
multi-dimensional arrays
that arise in scientific applications, which is part of a larger scientific data management system called TileDB. In contrast to ...existing solutions, TileDB is optimized for
both
dense and sparse arrays. Its key idea is to organize array elements into ordered collections called
fragments.
Each fragment is dense or sparse, and groups contiguous array elements into
data tiles
of fixed capacity. The organization into fragments turns random writes into sequential writes, and, coupled with a novel read algorithm, leads to very efficient reads. TileDB enables parallelization via multi-threading and multi-processing, offering thread-/process-safety and atomicity via lightweight locking. We show that TileDB delivers comparable performance to the HDF5 dense array storage manager, while providing much faster random writes. We also show that TileDB offers substantially faster reads and writes than the SciDB array database system with both dense and sparse arrays. Finally, we demonstrate that TileDB is considerably faster than adaptations of the Vertica relational column-store for dense array storage management, and at least as fast for the case of sparse arrays.
The GraphBLAS is a standard API for expressing Graphs in the language of linear algebra. The goal is to provide high performance while exploiting the fundamental simplicity of Graph algorithms in ...terms of a common set of "Basic Linear Algebra Subprograms". A robust parallel implementation of the GraphBLAS C specification is available as the SuiteSparse GraphBLAS library 1. The simplicity of the GraphBLAS, so apparent "in the math", is diminished when expressed in terms of a low level language such as C. To see the full expressive power of the GraphBLAS, a high level interface is needed so that the elegance of the mathematical underpinnings of the GraphBLAS is clearly apparent in the code. In this paper we introduce the Julia interface to the SuiteSparse:GraphBLAS library and compare it to the Python interface 2. We implement the PageRank and Triangle Centrality algorithms with remarkably little code and show that no significant performance is sacrificed by moving from C to the more productive Python and Julia interfaces.
Python is a widely used language in scientific computing. When the goal is high performance, however, Python lags far behind low-level languages such as C and Fortran. To support applications that ...stress performance, Python needs to access the full capabilities of modern CPUs. That means support for parallel multithreading. In this article, we describe PyOMP, a system that enables OpenMP in Python. Programmers write code in Python with OpenMP, Numba generates code that compiles to LLVM, and the resulting programs run with performance that approaches that from code written with C and OpenMP. In this article, we provide an update on the PyOMP project and explain how to install it and use it to write parallel multithreaded code in Python.
Towards a GraphBLAS Implementation for Go Costanza, Pascal; Hur, Ibrahim; Mattson, Timothy G.
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Conference Proceeding
The GraphBLAS are building blocks for constructing graph algorithms as linear algebra. They are defined mathematically with the goal that they would eventually map onto a variety of programming ...languages. Today they exist in C, C++, Python, MATLAB®, and Julia. In this paper, we describe the GraphBLAS for the Go programming language. A particularly interesting aspect of this work is that using the concurrency features of the Go language, we aim to build a runtime system that uses the GraphBLAS nonblocking mode by default.
Data pipelines are the new code. Consequently, data scientists need new tools to support the often time-consuming process of debugging their pipelines. We introduce
Dagger
, an end-to-end system to ...debug and mitigate data-centric errors in data pipelines, such as a data transformation gone wrong or a classifier underperforming due to noisy training data.
Dagger
supports inter-module debugging, where the pipeline blocks are treated as black boxes, as well as intra-module debugging, where users can debug data objects in Python scripts (e.g., DataFrames). In this demo, we will walk the audience through a rich, real-world business intelligence use case from our industrial collaborators at Intel, to highlight how
Dagger
enables data scientists to productively identify and mitigate data-centric problems at different stages of pipeline development.
GraphBLAS: C++ Iterators for Sparse Matrices Brock, Benjamin; McMillan, Scott; Buluc, Aydin ...
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Conference Proceeding
Iteration over opaque, generic data structures is an important feature of many C++ libraries. Aggressive compiler optimization and inlining enables generic C++ iterators to iterate over complex data ...structures with performance comparable to that of hand-tuned code with C-language (raw) pointers. In this paper, we describe the sparse matrix iterators in the current draft of the C++ GraphBLAS API, their support for a variety of backend data formats, and implementation strategies we have considered. We compare performance of these iterators to that of hand-tuned iteration with raw pointers, showing that our iterators introduce minimal overhead. We consider extensions to our iterator design for interoperability with the draft C++ Graph Library proposal and to support different semantics for iterating over sparse matrices (by row, by column, by specific diagonals, etc.).
Graphs play a key role in data analytics. Graphs and the software systems used to work with them are highly diverse. Algorithms interact with hardware in different ways and which graph solution works ...best on a given platform changes with the structure of the graph. This makes it difficult to decide which graph programming framework is the best for a given situation. In this paper, we try to make sense of this diverse landscape. We evaluate five different frameworks for graph analytics: SuiteS-parse GraphBLAS, Galois, the NWGraph library, the Graph Kernel Collection, and GraphIt. We use the GAP Benchmark Suite to evaluate each framework. GAP consists of 30 tests: six graph algorithms (breadth-first search, single-source shortest path, PageRank, betweenness centrality, connected components, and triangle counting) on five graphs. The GAP Benchmark Suite includes high-performance reference implementations to provide a performance baseline for comparison. Our results show the relative strengths of each framework, but also serve as a case study for the challenges of establishing objective measures for comparing graph frameworks.
The 48-core SCC Processor Mattson, Timothy G.; Riepen, Michael; Lehnig, Thomas ...
2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis,
11/2010
Conference Proceeding
The number of cores integrated onto a single die is expected to climb steadily in the foreseeable future. This move to many-core chips is driven by a need to optimize performance per watt. How best ...to connect these cores and how to program the resulting many-core processor, however, is an open research question. Designs vary from GPUs to cache-coherent shared memory multiprocessors to pure distributed memory chips. The 48-core SCC processor reported in this paper is an intermediate case, sharing traits of message passing and shared memory architectures. The hardware has been described elsewhere. In this paper, we describe the programmer's view of this chip. In particular we describe RCCE: the native message passing model created for the SCC processor.
The analysis of graphs has become increasingly important to a wide range of applications. Graph analysis presents a number of unique challenges in the areas of (1) software complexity, (2) data ...complexity, (3) security, (4) mathematical complexity, (5) theoretical analysis, (6) serial performance, and (7) parallel performance. Implementing graph algorithms using matrix-based approaches provides a number of promising solutions to these challenges. The GraphBLAS standard (istc-bigdata.org/GraphBlas) is being developed to bring the potential of matrix based graph algorithms to the broadest possible audience. The GraphBLAS mathematically defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the GraphBLAS and describes how the GraphBLAS can be used to address many of the challenges associated with analysis of graphs.