混合精度，多GPU设计，用于大规模的顶级稀疏本本特征

论文标题

混合精度，多GPU设计，用于大规模的顶级稀疏本本特征

A Mixed Precision, Multi-GPU Design for Large-scale Top-K Sparse Eigenproblems

论文作者

Sgherzi, Francesco, Parravicini, Alberto, Santambrogio, Marco Domenico

论文摘要

基于光谱方法的图形分析技术处理极大的稀疏矩阵，具有数百万甚至数十亿个非零值。这些算法的背后是顶部的稀疏本本特征，计算最大的特征值及其相关的特征向量。在这项工作中，我们利用GPU将顶级稀疏本本特征扩展到比以前获得的更大的矩阵，同时还提供了最新的执行时间。我们可以使用混合精确浮点算术算术透明地对多个GPU，过程外部矩阵以及调整精度和执行时间进行透明地分配计算。总体而言，我们比在104个线程CPU上运行的高度优化的ARPACK库和1.9倍的速度比最近的FPGA硬件设计快67倍。我们还确定了混合精确的浮点算术如何在双重精确度上提高50％的执行时间，并且是单精度浮点算术算术的12倍。

Graph analytics techniques based on spectral methods process extremely large sparse matrices with millions or even billions of non-zero values. Behind these algorithms lies the Top-K sparse eigenproblem, the computation of the largest eigenvalues and their associated eigenvectors. In this work, we leverage GPUs to scale the Top-K sparse eigenproblem to bigger matrices than previously achieved while also providing state-of-the-art execution times. We can transparently partition the computation across multiple GPUs, process out-of-core matrices, and tune precision and execution time using mixed-precision floating-point arithmetic. Overall, we are 67 times faster than the highly optimized ARPACK library running on a 104-thread CPU and 1.9 times than a recent FPGA hardware design. We also determine how mixed-precision floating-point arithmetic improves execution time by 50% over double-precision, and is 12 times more accurate than single-precision floating-point arithmetic.

下载PDF全文

下载文献需遵守相关版权规定

论文标题