论文标题
非负矩阵分解的交替级别非负平方框架(ARKNLS)
An Alternating Rank-K Nonnegative Least Squares Framework (ARkNLS) for Nonnegative Matrix Factorization
论文作者
论文摘要
非负矩阵分解(NMF)是一种降低数据维度的突出技术,已广泛用于文本挖掘,计算机视觉,模式发现和生物信息学。在本文中,提出了一个称为ARKNL的框架(交替的级别K-K非阴性约束最小二乘)用于计算NMF。首先,建立了用于级别K非负限制最小二乘(NLS)的递归公式。该递归公式可用于为任何整数K $ \ ge $ 1的级别k nls问题得出封闭形式的解决方案。结果,可以根据此封闭形式的解决方案获得的每个交替级别K-K非负平方框架的每个子问题。假设在NMF计算的上下文中,所有参与等级K NLS的矩阵都是完整的,那么目前最佳的NMF算法HALS(层次结构交替的最小平方)和ANLS-BPP(基于基于块主枢轴的NLS交替)可以将ARKN与K = 1 and K = 1和K = R.RM n M.然后,本文将重点放在k = 3的框架上,该框架通过等级-3 NLS问题的封闭形式解决方案导致NMF的新算法。此外,还提出了一种有效克服NMF计算背景下等级3 NLS潜在奇异性问题的新策略。使用真实和合成数据集的广泛数值比较表明,所提出的算法在计算准确性和CPU时间方面提供了最先进的性能。
Nonnegative matrix factorization (NMF) is a prominent technique for data dimensionality reduction that has been widely used for text mining, computer vision, pattern discovery, and bioinformatics. In this paper, a framework called ARkNLS (Alternating Rank-k Nonnegativity constrained Least Squares) is proposed for computing NMF. First, a recursive formula for the solution of the rank-k nonnegativity-constrained least squares (NLS) is established. This recursive formula can be used to derive the closed-form solution for the Rank-k NLS problem for any integer k $\ge$ 1. As a result, each subproblem for an alternating rank-k nonnegative least squares framework can be obtained based on this closed form solution. Assuming that all matrices involved in rank-k NLS in the context of NMF computation are of full rank, two of the currently best NMF algorithms HALS (hierarchical alternating least squares) and ANLS-BPP (Alternating NLS based on Block Principal Pivoting) can be considered as special cases of ARkNLS with k = 1 and k = r for rank r NMF, respectively. This paper is then focused on the framework with k = 3, which leads to a new algorithm for NMF via the closed-form solution of the rank-3 NLS problem. Furthermore, a new strategy that efficiently overcomes the potential singularity problem in rank-3 NLS within the context of NMF computation is also presented. Extensive numerical comparisons using real and synthetic data sets demonstrate that the proposed algorithm provides state-of-the-art performance in terms of computational accuracy and cpu time.