论文标题
用于高维数据可视化的基于拉普拉斯的集群收缩T-SNE
Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data Visualization
论文作者
论文摘要
降低降低技术旨在代表低维空间中的高维数据,以提取隐藏有用的信息,或者促进对数据的视觉理解和解释。但是,很少有人考虑高维数据中隐含的潜在群集信息。在本文中,我们提出了基于T-SNE的新图形非线性降低方法Laptsne,这是将高维数据视为2D散点图的最佳技术之一。具体而言,Laptsne在学习保留从高维空间到低维空间的局部和全球结构时,利用图形拉普拉斯式的特征值信息缩小了低维嵌入中的潜在簇。解决所提出的模型是不平凡的,因为归一化对称拉普拉斯的特征值是决策变量的函数。我们提供了一种具有收敛性保证的大型最小化算法,以解决LAPTSNE的优化问题,并显示如何分析梯度,当考虑使用Laplacian兼容的目标进行优化时,这可能会引起广泛关注。我们通过与七个基准数据集上的最新方法进行正式比较,通过视觉和已建立的定量测量值进行了正式比较。结果证明了我们方法比T-SNE和UMAP等基线的优越性。我们还为LAPTSNE提供了样本外扩展,大规模扩展和迷你批量扩展,以促进各种情况下的尺寸降低。
Dimensionality reduction techniques aim at representing high-dimensional data in low-dimensional spaces to extract hidden and useful information or facilitate visual understanding and interpretation of the data. However, few of them take into consideration the potential cluster information contained implicitly in the high-dimensional data. In this paper, we propose LaptSNE, a new graph-layout nonlinear dimensionality reduction method based on t-SNE, one of the best techniques for visualizing high-dimensional data as 2D scatter plots. Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding when learning to preserve the local and global structure from high-dimensional space to low-dimensional space. It is nontrivial to solve the proposed model because the eigenvalues of normalized symmetric Laplacian are functions of the decision variable. We provide a majorization-minimization algorithm with convergence guarantee to solve the optimization problem of LaptSNE and show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective. We evaluate our method by a formal comparison with state-of-the-art methods on seven benchmark datasets, both visually and via established quantitative measurements. The results demonstrate the superiority of our method over baselines such as t-SNE and UMAP. We also provide out-of-sample extension, large-scale extension and mini-batch extension for our LaptSNE to facilitate dimensionality reduction in various scenarios.