DIKUL - logo
E-viri
Recenzirano Odprti dostop
  • Kokkos: Enabling manycore p...
    Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel

    Journal of parallel and distributed computing, 12/2014, Letnik: 74, Številka: 12
    Journal Article

    The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving manycore architectures. High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. A major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address manycore parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns. The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diverse manycore architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this paper we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. The Kokkos library is under active research and development to incorporate capabilities from new generations of manycore architectures, and to address a growing list of applications and domain libraries. •We developed a performance portable programming model (PM) for manycore devices.•Unifying parallel dispatch and data layout is mandatory for performance portability.•The Kokkos C++library implements this PM with pthreads, OpenMP, and CUDA back-ends.•Demonstrate Xeon Phi and NVIDIA GPU performance portability with mini-applications.•Recommend a strategy for legacy application codes to migrate to manycore.