Automated Code Acceleration with Compilation to OpenCL

Multi-accelerator platforms consist of a diverse set of accelerators and are capable of processing parallel workloads very efficiently. However, this requires applications to be ported to various accelerators using different programming languages, models and tools. Additionally, developers also need to understand the low-level accelerator details, leading to an increase in the design effort and costs.

To tackle this challenge, we propose HTrOP, a compilation approach and prototypical implementation. HTrOP is able to automatically analyze a sequential CPU application, detect computational hotspots and generate parallel OpenCL host and kernel code. The potential is demonstrated by offloading hotspots to different OpenCL-enabled resources (currently CPU, GPGPU and the manycore Intel Xeon Phi). Our contribution includes:

  1. Automatic transformation of suitable data-parallel loops into independent OpenCL-typical work-items that are executed in parallel.

  2. A two-layered approach of identifying hotspots at compile time and refining offloading decisions at runtime based on parameters like input sizes, availability of accelerators, etc.

  3. Infrastructure for offloading to and migrating between accelerators, while minimizing data transfer overheads by reusing data though application-specific, generated code parts.

  4. A thorough evaluation of performance gains and energy savings with different accelerator targets, taking into account one-time and recurring overheads introduced by our approach. The evaluation includes a comparison to handwritten pragma-based OpenACC code for multicore CPUs and GPUs.

Source Code

The source code of our prototype implementation is available at github.com/pc2/htrop.

Publications

Transparent Acceleration for Heterogeneous Platforms with Compilation to OpenCL
H. Riebler, G.F. Vaz, T. Kenter, C. Plessl, ACM Trans. Archit. Code Optim. (TACO) 16 (2019) 14:1–14:26.
Automated Code Acceleration Targeting Heterogeneous OpenCL Devices
H. Riebler, G.F. Vaz, T. Kenter, C. Plessl, in: Proc. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), ACM, 2018.
Show all publications

 

Keywords

Transparent Acceleration; Runtime System; Runtime Decision; Multi-Accelerator; OpenCL; OpenACC; Offloading; Migration; Performance and Energy; Software and its engineering; Runtime environments; Incremental compilers; Computing methodologies; Parallel programming languages; Computer systems organization; Heterogeneous systems; Heterogeneous (hybrid) systems; Accelerator Programming; Hotspot Detection; Code Generation; Code Generation Decision; Parallel Kernel Code; Data Optimization;

Dr. Heinrich Riebler

Paderborn Center for Parallel Computing (PC2)

Scientific Advisor FPGA Acceleration

Write email +49 5251 60-5382

Dr. Tobias Kenter

Paderborn Center for Parallel Computing (PC2)

Scientific Advisor FPGA Acceleration

Write email +49 5251 60-4340