论文标题

GPU加速的分子对接应用程序的可移植性用于云和HPC:便携式编译器指令可以在所有平台上提供性能吗?

Portability for GPU-accelerated molecular docking applications for cloud and HPC: can portable compiler directives provide performance across all platforms?

论文作者

Thavappiragasam, Mathialakan, Elwasif, Wael, Sedova, Ada

论文摘要

类似药物样分子的高通量结构筛选已成为生物医学研究中的常见工具。最近,使用图形处理单元(GPU)加速为分子对接程序提供了巨大的性能。云和高性能计算(HPC)资源均已用于具有分子对接程序的大屏幕。尽管NVIDIA GPU统治了云和HPC资源,但AMD和Intel等新供应商现在正在进入该领域,从而在不同的GPU上创建了软件可移植性问题。理想情况下,可以使用可便携式编程模型来最大化软件生产率,这些模型能够在跨体系结构中保持高性能。尽管在许多情况下,编译器指令已被用作一种简单的方法,可以将基于CPU的程序的平行区域卸载到GPU加速器上,但它们也可能是一个有吸引力的编程模型,用于在不同的GPU供应商中提供可移植性,在这种情况下,移植过程可能会沿相反的方向进行:从低层,低层,特定于建筑的摘要到更高的摘要到基于更高的摘要到基于更高的基于基础的摘要。 Minimdock是一种新的迷你应用(Miniapp),旨在捕获分子对接计算中发现的基本计算内核,例如药物发现工作中使用的,以测试跨GPU架构的不同解决方案。在这里,我们将Minimdock扩展到使用OpenMP指令的GPU卸载,并与使用CUDA的内核的性能以及NVIDIA和AMD GPU的臀部以及跨越不同的编译器进行比较,探索性能瓶颈。我们记录了这个反向运动过程,从高度优化的设备代码到使用指令,比较代码结构并描述在此工作中克服的障碍的高级版本。

High-throughput structure-based screening of drug-like molecules has become a common tool in biomedical research. Recently, acceleration with graphics processing units (GPUs) has provided a large performance boost for molecular docking programs. Both cloud and high-performance computing (HPC) resources have been used for large screens with molecular docking programs; while NVIDIA GPUs have dominated cloud and HPC resources, new vendors such as AMD and Intel are now entering the field, creating the problem of software portability across different GPUs. Ideally, software productivity could be maximized with portable programming models that are able to maintain high performance across architectures. While in many cases compiler directives have been used as an easy way to offload parallel regions of a CPU-based program to a GPU accelerator, they may also be an attractive programming model for providing portability across different GPU vendors, in which case the porting process may proceed in the reverse direction: from low-level, architecture-specific code to higher-level directive-based abstractions. MiniMDock is a new mini-application (miniapp) designed to capture the essential computational kernels found in molecular docking calculations, such as are used in pharmaceutical drug discovery efforts, in order to test different solutions for porting across GPU architectures. Here we extend MiniMDock to GPU offloading with OpenMP directives, and compare to performance of kernels using CUDA, and HIP on both NVIDIA and AMD GPUs, as well as across different compilers, exploring performance bottlenecks. We document this reverse-porting process, from highly optimized device code to a higher-level version using directives, compare code structure, and describe barriers that were overcome in this effort.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源