cufastertucker：在GPU平台上平行稀疏快速交易分解的随机优化策略

论文标题

cufastertucker：在GPU平台上平行稀疏快速交易分解的随机优化策略

cuFasterTucker: A Stochastic Optimization Strategy for Parallel Sparse FastTucker Decomposition on GPU Platform

论文作者

Li, Zixuan

论文摘要

当前，科学数据的规模正在以前所未有的速度增长。张量形式的数据表现出高阶，高维和高度稀疏的特征。尽管基于张量的分析方法非常有效，但数据大小的大幅增加使原始张量无法处理。张量分解将张量分解为可以通过基于张量的分析方法利用的多个低级矩阵或张量。塔克分解是一种算法，将$ n $ order张量分解为$ n $ n $低级因子矩阵和低级别的核心张量。但是，大多数塔克分解方法都伴随着巨大的中间变量和巨大的计算负载，使它们无法处理高阶和高维张量。在本文中，我们提出了基于FastTucker分解的FasterTucker分解，这是Tucker分解的一种变体。并提出了在GPU平台上有效的平行较为ferttucker分解算法。它具有非常低的存储和计算要求，并有效地解决了高阶和高维稀疏张量分解的问题。与最先进的算法相比，它分别更新因子矩阵和更新核心矩阵的速度约为$ 15倍和$ 7X。

Currently, the size of scientific data is growing at an unprecedented rate. Data in the form of tensors exhibit high-order, high-dimensional, and highly sparse features. Although tensor-based analysis methods are very effective, the large increase in data size makes the original tensor impossible to process. Tensor decomposition decomposes a tensor into multiple low-rank matrices or tensors that can be exploited by tensor-based analysis methods. Tucker decomposition is such an algorithm, which decomposes a $n$-order tensor into $n$ low-rank factor matrices and a low-rank core tensor. However, most Tucker decomposition methods are accompanied by huge intermediate variables and huge computational load, making them unable to process high-order and high-dimensional tensors. In this paper, we propose FasterTucker decomposition based on FastTucker decomposition, which is a variant of Tucker decomposition. And an efficient parallel FasterTucker decomposition algorithm cuFasterTucker on GPU platform is proposed. It has very low storage and computational requirements, and effectively solves the problem of high-order and high-dimensional sparse tensor decomposition. Compared with the state-of-the-art algorithm, it achieves a speedup of around $15X$ and $7X$ in updating the factor matrices and updating the core matrices, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题