Accelerator specific OpenCL backend for LLVM

In a nutshell

Improve our existing code generator ( LLVM IR --> OpenCL) regarding

  • Add accelerator-specific optimizations (OpenCL for GPU, OpenCL for FPGA, OpenCL for Xeon Phi etc.)
  • Optimize for specific system requirements: performance, energy, etc. by guiding backend
  • Combine with information provided by polyhedral optimization
  • Compare results to hand-written code

Summary

Nowadays systems are becoming heterogeneous (CPU+GPU, CPU+FPGA, etc.) This opens a vector with multiple acceleration opportunities of applications. However, programming different accelerators is cumbersome, time-consuming and error prone. To automate the process, we make use of the LLVM compiler infrastructure. We analyse an application for computational intensive parts (hotspots), optimize the hotspots and finally just-in-time (JIT) generate accelerated OpenCL code suitable for various heterogeneous resources (GPUs, FPGAs, etc.).

The Open Computing Language (OpenCL) provides a standard interface for parallel computing using task- and data-based parallelism, which can be executed across different heterogeneous devices (CPUs, GPUs, MICs and FPGAs). As we want to target multiple accelerators and adopt a “write program code once, run anywhere” approach, we make use of a basic OpenCL backend for our LLVM tool flow. The backend generates functionally portable OpenCL code which can be run on different accelerators. As the underlying architecture of such accelerators differ, the performance of this OpenCL code differs when compared to accelerator-specific OpenCL code. 

The first goal of this thesis would be to extend this basic OpenCL backend to consider accelerator-specific optimizations, as well as the systems performance/energy requirements to generate optimized OpenCL code. Additionally, for the sake of simplicity, the current OpenCL backend makes some assumptions regarding the loop structure of the input code which limits the applicability of our approach. Polyhedral techniques have been used to detect hotspot structures, perform automatic parallelization, data locality optimizations, etc. The second goal of this thesis is to investigate how polyhedral techniques can be used to improve the current OpenCL backend to support a wider range of loop structures and improve the performance of the generated OpenCL code.

We provide

  • Interesting, research related topics
  • Heterogeneous server with multiple state-of-the-art soft-/hardware
  • Good work atmosphere
  • Optionally: work area in student lab

You should bring

  • Good C/C++ knowledge
  • Motivated, able to work autonomously 
  • Optimally: OpenCL experience
  • Optimally: LLVM experience

Dr. Heinrich Riebler

Paderborn Center for Parallel Computing (PC2)

Fachberater FPGA Beschleunigung

E-Mail schreiben +49 5251 60-5382