论文标题
Petascale XCT:带有多GPU节点的层次通信的3D图像重建
Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes
论文作者
论文摘要
X射线计算机断层扫描是一种常用的技术,用于同步设备的非侵入成像。迭代断层造影重建算法通常是从2D X射线图像中恢复高质量的3D体积图像的首选,但是,由于其计算要求,它们的使用仅限于中小型数据集。在本文中,我们提出了一种高性能的迭代重建系统,用于Terabyte(S) - 尺度3D体积。我们的设计涉及三个新颖的优化:(1)通过将2D中心的方法扩展到3D来优化(返回)投影操作员; (2)通过使用许多GPU来利用“脂肪节”架构来执行层次通信; (3)利用混合精液类型,同时保留收敛速度和质量。我们广泛评估了峰值超级计算机上提出的优化和扩展。我们最大的重建是带有9KX11KX11K体素的小鼠大脑体积,使用24,576 GPU,总重建时间在三分钟以下,达到65个PFLOPS:summit峰值性能的34%。
X-ray computed tomography is a commonly used technique for noninvasive imaging at synchrotron facilities. Iterative tomographic reconstruction algorithms are often preferred for recovering high quality 3D volumetric images from 2D X-ray images, however, their use has been limited to small/medium datasets due to their computational requirements. In this paper, we propose a high-performance iterative reconstruction system for terabyte(s)-scale 3D volumes. Our design involves three novel optimizations: (1) optimization of (back)projection operators by extending the 2D memory-centric approach to 3D; (2) performing hierarchical communications by exploiting "fat-node" architecture with many GPUs; (3) utilization of mixed-precision types while preserving convergence rate and quality. We extensively evaluate the proposed optimizations and scaling on the Summit supercomputer. Our largest reconstruction is a mouse brain volume with 9Kx11Kx11K voxels, where the total reconstruction time is under three minutes using 24,576 GPUs, reaching 65 PFLOPS: 34% of Summit's peak performance.