论文标题

基于图形的自适应广义回归模型,用于无监督特征选择

Adaptive Graph-based Generalized Regression Model for Unsupervised Feature Selection

论文作者

Huang, Yanyong, Shen, Zongxin, Cai, Fuxu, Li, Tianrui, Lv, Fengmao

论文摘要

无监督的特征选择是降低没有标签的高维数据维度的重要方法,这是避免``维度的诅咒''并改善后续机器学习任务的性能,例如聚类和检索的好处。如何选择不相关和歧视性特征是无监督特征选择的关键问题。许多提出的方法选择具有强大判别性和高冗余性的特征,反之亦然。但是,他们仅满足这两个标准之一。其他现有方法通过在原始特征空间上构造图形矩阵来选择具有低冗余性的判别特征。由于原始特征空间通常包含冗余和噪声,因此它将降低特征选择的性能。为了解决这些问题,我们首先提出了一个新颖的广义回归模型,该模型由不相关的约束和$ \ ell_ {2,1} $ - 规范正则化。它可以同时选择不相关和歧视性特征,并减少属于同一邻居的这些数据点的方差,这对聚类任务有帮助。此外,通过自适应学习相似性诱导的图,在缩小的尺寸空间上构建了局部数据的局部固有结构。然后,基于光谱分析的图形结构和指标矩阵的学习被整合到广义回归模型中。最后,我们开发了一种替代性迭代优化算法来解决目标函数。对九个现实世界数据集进行了一系列实验,以证明与其他竞争方法相比,提出的方法的有效性。

Unsupervised feature selection is an important method to reduce dimensions of high dimensional data without labels, which is benefit to avoid ``curse of dimensionality'' and improve the performance of subsequent machine learning tasks, like clustering and retrieval. How to select the uncorrelated and discriminative features is the key problem of unsupervised feature selection. Many proposed methods select features with strong discriminant and high redundancy, or vice versa. However, they only satisfy one of these two criteria. Other existing methods choose the discriminative features with low redundancy by constructing the graph matrix on the original feature space. Since the original feature space usually contains redundancy and noise, it will degrade the performance of feature selection. In order to address these issues, we first present a novel generalized regression model imposed by an uncorrelated constraint and the $\ell_{2,1}$-norm regularization. It can simultaneously select the uncorrelated and discriminative features as well as reduce the variance of these data points belonging to the same neighborhood, which is help for the clustering task. Furthermore, the local intrinsic structure of data is constructed on the reduced dimensional space by learning the similarity-induced graph adaptively. Then the learnings of the graph structure and the indicator matrix based on the spectral analysis are integrated into the generalized regression model. Finally, we develop an alternative iterative optimization algorithm to solve the objective function. A series of experiments are carried out on nine real-world data sets to demonstrate the effectiveness of the proposed method in comparison with other competing approaches.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源