Dr. Tobias Kenter, Universität Paderborn
Simulations of electromagnetic effects in novel materials and surfaces contribute a significant workload to our HPC systems. In order to improve performance and energy efficiency of these workloads, we have investigated them as a target for FPGA acceleration by looking at applications which operate on unstructured meshes and use the Discontinuous Galerkin method. In an initial extensive case study, we found out that several characteristics enable these applications to profit well from the flexibility of FPGA architectures. A single FPGA of the previous Arria 10 generation is now outperforming the two-socket CPU nodes of the previous HPC Oculus cluster by around 1.5–2x at much lower power consumption. With the availability of multiple FPGA nodes connected with high speed interconnect, we scaled this application to up to 32 Stratix 10 FPGAs by communicating through the host via MPI. Multiple FPGAs create the possibility to solve larger problem sizes or smaller problem sizes faster, but the parallel efficiency still left room for improvement. The addition of direct FPGA-to-FPGA interconnect provides the foundation for such improvement. In initial designs on 2 and 4 FPGAs, the efficiency problems were resolved and significant headroom for more communication-intensive scenarios was indicated.
In the collaborative BMBF-funded project HighPerMeshes, we are working on generalizing these case studies towards more applications regarding unstructured meshes by using a domain-specific language (DSL) embedded in C++. The manually-optimized designs will be complemented by code generation for FPGAs and other targets. Scaling is achieved transparently to the application writer through distributed execution of suitable loop structures within the DSL.