C++Now 2014 has ended
Please visit the C++Now website.
Back To Schedule
Thursday, May 15 • 2:30pm - 4:00pm
Accelerator Programming with C++ AMP

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Today, heterogeneous programming is ubiquitous across many segments of the C++
industry. Data centers utilize GPGPUs to process vast quantities of data,
supercomputers increasingly draw upon accelerators to provide the bulk of their
computational power, and mobile devices frequently couple "capability" CPUs with
"capacity" GPUs to provide high-efficiency computational power.

Typically, accelerators are only suitable for executing certain portions of a
program. They frequently rely on CPUs to make big-picture execution decisions,
access main memory, and manage peripheral hardware. The interactions between
CPUs, accelerators and other system components makes heterogeneous systems

Furthermore, accelerators may live off-chip, contain their own
private memory, or have noticeable communication latencies. Heterogeneous
systems may contain multiple CPUs that communicate via vendor-specific
processor interconnects. System complexity can be further complicated by
applications that are storage- or network-intensive.

For many C++ programmers, heterogeneous programming is no longer a luxury;
instead, it has become a necessity. C++14 provides no mechanism for
heterogeneous programming; C++ programmers must rely on software libraries to
harness the power of accelerators. There are numerous high-quality frameworks for
utilizing accelerators, but many of these frameworks are usable only
with certain types of accelerators or emphasize a purely synchronous offloading

The C++ Accelerated Massive Parallelism (C++ AMP) open specification, published
by Microsoft, presents a hardware-agnostic interface for exploiting accelerator
hardware in modern C++ applications. C++ AMP consists of both language
extensions and an STL-like library component. It provides support for both
synchronous and asynchronous offloading. C++ AMP allows programmers to write
accelerator-aware applications without having a detailed knowledge of the
intricacies of heterogeneous hardware.

In addition to the mature C++ AMP implementation provided by Visual Studio, there
a number of C++ AMP implementations for a variety of platforms. To name a
few, Intel is developing an implementation called Shevlin Park and the HSA
foundation is working on a Clang-based C++ AMP implementation.

This tutorial will present an overview of C++ AMP from a software-centric
viewpoint, covering the following topics:

* Programmatically preparing data for accelerators
* Transferring data to/from accelerators
* Offloading code to accelerators (e.g. restrict(amp), parallel_for_each)
* Controlling accelerator parallelism (e.g. tiling, barriers)

This talk will emphasize usage of the asynchronous interfaces provided by C++
AMP to write wait-free offloaded code.

The intended audience is C++ developers who are either using accelerators today
or will be using accelerators in the future. The talk will include limited
discussion of specific accelerator hardware or software implementations of C++
AMP. This presentation will be relevant to developers on all platforms, not just

avatar for Bryce Adelstein Lelbach

Bryce Adelstein Lelbach

CUDA C++ Core Libraries Lead, NVIDIA

Thursday May 15, 2014 2:30pm - 4:00pm MDT

Attendees (0)