快速，安全的分布式非负矩阵分解

论文标题

快速，安全的分布式非负矩阵分解

Fast and Secure Distributed Nonnegative Matrix Factorization

论文作者

Qian, Yuqiu, Tan, Conghui, Ding, Danhao, Li, Hui, Mamoulis, Nikos

论文摘要

非负矩阵分解（NMF）已成功应用于多个数据挖掘任务。最近，由于大型矩阵的高成本，人们对NMF的加速有越来越多的兴趣。另一方面，由于普遍将NMF应用于图像和文本分析中，因此NMF的隐私问题值得关注，因为NMF可能涉及在几个方（例如医院）中利用隐私数据（例如，医疗图像和记录）。在本文中，我们研究了分布式NMF的加速度和安全问题。首先，我们为NMF提出了一种分布式的草图交替的非负最小二乘（DSANLS）框架，该框架利用矩阵草图技术来减少具有收敛保证的非决定性最小二乘子问题的大小。对于第二个问题，我们表明具有修改的DSANL可以适应安全设置，但仅适用于一个或有限的迭代。因此，我们提出了具有安全保证的同步和异步设置中的四种有效分布式NMF方法。我们在几个实际数据集上进行了广泛的实验，以显示我们提出的方法的优越性。我们的方法的实现可在https://github.com/qianyuqiu79/dsanls上获得。

Nonnegative matrix factorization (NMF) has been successfully applied in several data mining tasks. Recently, there is an increasing interest in the acceleration of NMF, due to its high cost on large matrices. On the other hand, the privacy issue of NMF over federated data is worthy of attention, since NMF is prevalently applied in image and text analysis which may involve leveraging privacy data (e.g, medical image and record) across several parties (e.g., hospitals). In this paper, we study the acceleration and security problems of distributed NMF. Firstly, we propose a distributed sketched alternating nonnegative least squares (DSANLS) framework for NMF, which utilizes a matrix sketching technique to reduce the size of nonnegative least squares subproblems with a convergence guarantee. For the second problem, we show that DSANLS with modification can be adapted to the security setting, but only for one or limited iterations. Consequently, we propose four efficient distributed NMF methods in both synchronous and asynchronous settings with a security guarantee. We conduct extensive experiments on several real datasets to show the superiority of our proposed methods. The implementation of our methods is available at https://github.com/qianyuqiu79/DSANLS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题