Achtung:

Sie haben Javascript deaktiviert!
Sie haben versucht eine Funktion zu nutzen, die nur mit Javascript möglich ist. Um sämtliche Funktionalitäten unserer Internetseite zu nutzen, aktivieren Sie bitte Javascript in Ihrem Browser.

Data Center Building O Show image information

Data Center Building O

FPGA Acceleration of Electromagnetic Simulations

Figure: FPGA cards installed in a one of the Noctua servers.

Dr. Tobias Kenter, Universität Paderborn

Simulations of electromagnetic effects in novel materials and surfaces contribute a significant workload to our HPC systems. In order to improve performance and energy efficiency of these workloads, we have investigated them as a target for FPGA acceleration by looking at applications which operate on unstructured meshes and use the Discontinuous Galerkin method. In an initial extensive case study, we found out that several characteristics enable these applications to profit well from the flexibility of FPGA architectures. A single FPGA of the previous Arria 10 generation is now outperforming the two-socket CPU nodes of the previous HPC Oculus cluster by around 1.5–2x at much lower power consumption. With the availability of multiple FPGA nodes connected with high speed interconnect, we scaled this application to up to 32 Stratix 10 FPGAs by communicating through the host via MPI. Multiple FPGAs create the possibility to solve larger problem sizes or smaller problem sizes faster, but the parallel efficiency still left room for improvement. The addition of direct FPGA-to-FPGA interconnect provides the foundation for such improvement. In initial designs on 2 and 4 FPGAs, the efficiency problems were resolved and significant headroom for more communication-intensive scenarios was indicated.

In the collaborative BMBF-funded project HighPerMeshes, we are working on generalizing these case studies towards more applications regarding unstructured meshes by using a domain-specific language (DSL) embedded in C++. The manually-optimized designs will be complemented by code generation for FPGAs and other targets. Scaling is achieved transparently to the application writer through distributed execution of suitable loop structures within the DSL.

References

Tobias Kenter, Gopinath Mahale, Samer Alhaddad, Yevgen Grynko, Christian Schmitt, Ayesha Afzal, Frank Hannig, Jens Förstner and Christian Plessl,
"OpenCL-based FPGA Design to Accelerate the Nodal Discontinuous Galerkin Method for Unstructured Meshes"
In Proc. IEEE Symp. on Field-Programmable Custom Computing Machines (FCCM), 2018, IEEE.

The University for the Information Society