Today, heterogeneous programming is ubiquitous across many segments of the C++ industry. Data centers utilize GPGPUs to process vast quantities of data, supercomputers increasingly draw upon accelerators to provide the bulk of their computational power, and mobile devices frequently couple "capability" CPUs with "capacity" GPUs to provide high-efficiency computational power.
Typically, accelerators are only suitable for executing certain portions of a program. They frequently rely on CPUs to make big-picture execution decisions, access main memory, and manage peripheral hardware. The interactions between CPUs, accelerators and other system components makes heterogeneous systems complex.
Furthermore, accelerators may live off-chip, contain their own private memory, or have noticeable communication latencies. Heterogeneous systems may contain multiple CPUs that communicate via vendor-specific processor interconnects. System complexity can be further complicated by applications that are storage- or network-intensive.
For many C++ programmers, heterogeneous programming is no longer a luxury; instead, it has become a necessity. C++14 provides no mechanism for heterogeneous programming; C++ programmers must rely on software libraries to harness the power of accelerators. There are numerous high-quality frameworks for utilizing accelerators, but many of these frameworks are usable only with certain types of accelerators or emphasize a purely synchronous offloading model.
The C++ Accelerated Massive Parallelism (C++ AMP) open specification, published by Microsoft, presents a hardware-agnostic interface for exploiting accelerator hardware in modern C++ applications. C++ AMP consists of both language extensions and an STL-like library component. It provides support for both synchronous and asynchronous offloading. C++ AMP allows programmers to write accelerator-aware applications without having a detailed knowledge of the intricacies of heterogeneous hardware.
In addition to the mature C++ AMP implementation provided by Visual Studio, there a number of C++ AMP implementations for a variety of platforms. To name a few, Intel is developing an implementation called Shevlin Park and the HSA foundation is working on a Clang-based C++ AMP implementation.
This tutorial will present an overview of C++ AMP from a software-centric viewpoint, covering the following topics:
* Programmatically preparing data for accelerators * Transferring data to/from accelerators * Offloading code to accelerators (e.g. restrict(amp), parallel_for_each) * Controlling accelerator parallelism (e.g. tiling, barriers)
This talk will emphasize usage of the asynchronous interfaces provided by C++ AMP to write wait-free offloaded code.
The intended audience is C++ developers who are either using accelerators today or will be using accelerators in the future. The talk will include limited discussion of specific accelerator hardware or software implementations of C++ AMP. This presentation will be relevant to developers on all platforms, not just Windows.