论文标题
用于确定非负矩阵分解中潜在维度的神经网络
A Neural Network for Determination of Latent Dimensionality in Nonnegative Matrix Factorization
论文作者
论文摘要
事实证明,非负矩阵分解(NMF)是一种强大的无监督学习方法,可在复杂且嘈杂的数据集中揭示具有数据挖掘,文本识别,降低维度,面部识别,面部识别,异常检测,盲目源分离以及许多其他领域的应用程序集中的隐藏特征。 NMF的一个重要输入是数据的潜在维度,即探索数据集中存在的隐藏功能K的数量。不幸的是,这个数量很少是先验的。我们利用一种监督的机器学习方法与最新的模型确定方法(称为NMFK)自动确定隐藏功能的数量。 NMFK在矩阵的集合上执行一组NMF模拟,通过引导初始数据集获得,并确定哪些K产生了稳定的潜在特征组,可很好地重建初始数据集。然后,我们训练一个多层感知器(MLP)分类器网络,以确定从NMFK获得的NMF解决方案的统计和特征的正确数量的潜在特征。为了训练MLP分类器,用NMFK对具有预定潜在特征的58,660矩阵进行了训练组。将MLP分类器与NMFK结合使用,将其应用于持有的测试集时保持超过95%的成功率。此外,当应用于两个众所周知的基准数据集时,NMFK/MLP可以正确恢复已建立的隐藏功能数量。最后,我们将方法的准确性与基于AIC和稳定性的方法进行了比较。
Non-negative Matrix Factorization (NMF) has proven to be a powerful unsupervised learning method for uncovering hidden features in complex and noisy data sets with applications in data mining, text recognition, dimension reduction, face recognition, anomaly detection, blind source separation, and many other fields. An important input for NMF is the latent dimensionality of the data, that is, the number of hidden features, K, present in the explored data set. Unfortunately, this quantity is rarely known a priori. We utilize a supervised machine learning approach in combination with a recent method for model determination, called NMFk, to determine the number of hidden features automatically. NMFk performs a set of NMF simulations on an ensemble of matrices, obtained by bootstrapping the initial data set, and determines which K produces stable groups of latent features that reconstruct the initial data set well. We then train a Multi-Layer Perceptron (MLP) classifier network to determine the correct number of latent features utilizing the statistics and characteristics of the NMF solutions, obtained from NMFk. In order to train the MLP classifier, a training set of 58,660 matrices with predetermined latent features were factorized with NMFk. The MLP classifier in conjunction with NMFk maintains a greater than 95% success rate when applied to a held out test set. Additionally, when applied to two well-known benchmark data sets, the swimmer and MIT face data, NMFk/MLP correctly recovered the established number of hidden features. Finally, we compared the accuracy of our method to the ARD, AIC and Stability-based methods.