论文标题
GPU加速不连续的Galerkin方法:3450亿未知数的30倍加速
GPU-Accelerated Discontinuous Galerkin Methods: 30x Speedup on 345 Billion Unknowns
论文作者
论文摘要
开发了一种不连续的Galerkin方法,用于离散可压缩的Euler方程,Inviscid流体动力学的控制方程,On cartesian网状网络,供通过OCCA使用图形处理单元,这是在多线读取硬件架构上的性能可移植性的统一方法。仅使用非cuda-Aware MPI通信的仅CPU实现的30倍到解决的速度,最多可达1,536个NVIDIA V100 GPU,并且显示了6,144个NVIDIA V100 GPU,对于包含3450亿不知名的问题。进行了CUDA-AWARE MPI通信与非GPUDIRECT通信的比较,该沟通在八个由32个NVIDIA V100 GPU组成的节点上额外进行了24%的加速。
A discontinuous Galerkin method for the discretization of the compressible Euler equations, the governing equations of inviscid fluid dynamics, on Cartesian meshes is developed for use of Graphical Processing Units via OCCA, a unified approach to performance portability on multi-threaded hardware architectures. A 30x time-to-solution speedup over CPU-only implementations using non-CUDA-Aware MPI communications is demonstrated up to 1,536 NVIDIA V100 GPUs and parallel strong scalability is shown up to 6,144 NVIDIA V100 GPUs for a problem containing 345 billion unknowns. A comparison of CUDA-Aware MPI communication to non-GPUDirect communication is performed demonstrating an additional 24% speedup on eight nodes composed of 32 NVIDIA V100 GPUs.