Kokkos Core Kernels Package Version of the Day
Loading...
Searching...
No Matches
Trilinos/Kokkos: Shared-memory programming interface and computational kernels

Introduction

The Kokkos package has two main components. The first, sometimes called "%Kokkos Array" or just "%Kokkos," implements a performance-portable shared-memory parallel programming model and data containers. The second, called "%Kokkos Classic," consists of computational kernels that support the Tpetra package.

The Kokkos programming model

Kokkos implements a performance-portable shared-memory parallel programming model and data containers. It lets you write an algorithm once, and just change a template parameter to get the optimal data layout for your hardware. Kokkos has back-ends for the following parallel programming models:

  • Kokkos::Threads: C++11 Threads (std::thread)
  • Kokkos::OpenMP: OpenMP
  • Kokkos::Cuda: NVIDIA's CUDA programming model for graphics processing units (GPUs)
  • Kokkos::Serial: No thread parallelism

Kokkos also has optimizations for shared-memory parallel systems with nonuniform memory access (NUMA). Its containers can hold data of any primitive ("plain old") data type (and some aggregate types). Kokkos Array may be used as a stand-alone programming model.

Kokkos' parallel operations include the following:

  • parallel_for: a thread-parallel "for loop"
  • parallel_reduce: a thread-parallel reduction
  • parallel_scan: a thread-parallel prefix scan operation

as well as expert-level platform-independent interfaces to thread "teams," per-team "shared memory," synchronization, and atomic update operations.

Kokkos' data containers include the following:

  • Kokkos::View: A multidimensional array suitable for thread-parallel operations. Its layout (e.g., row-major or column-major) is optimized by default for the particular thread-parallel device.
  • Kokkos::Vector: A drop-in replacement for std::vector that eases porting from standard sequential C++ data structures to Kokkos' parallel data structures.
  • Kokkos::UnorderedMap: A parallel lookup table comparable in functionality to std::unordered_map.

Kokkos also uses the above basic containers to implement higher-level data structures, like sparse graphs and matrices.

A good place to start learning about Kokkos would be these tutorial slides from the 2013 Trilinos Users' Group meeting.

Kokkos Classic

"%Kokkos Classic" consists of computational kernels that support the Tpetra package. These kernels include sparse matrix-vector multiply, sparse triangular solve, Gauss-Seidel, and dense vector operations. They are templated on the type of objects (Scalar) on which they operate. This component was not meant to be visible to users; it is an implementation detail of the Tpetra distributed linear algebra package.

Kokkos Classic also implements a shared-memory parallel programming model. This inspired and preceded the Kokkos programming model described in the previous section. Users should consider the Kokkos Classic programming model deprecated, and prefer the new Kokkos programming model.