论文标题
ShapeVis:高维数据可视化
ShapeVis: High-dimensional Data Visualization at Scale
论文作者
论文摘要
我们提出ShapeVis,这是一种可扩展的可视化技术,用于从拓扑数据分析中启发的点云数据。我们的方法在压缩图表中捕获了数据的潜在几何和拓扑结构。数据可视化技术映射器已经报道了很大的成功,该映射器谨慎地近似于数据上的滤波器函数的REEB图。但是,当使用标准维度降低算法作为过滤器函数时,映射器会遭受相当大的计算成本。这使得很难扩展到高维数据。我们提出的技术依赖于在数据歧管沿着数据歧管找到一个称为地标的点子集来构建加权证人图。该图捕获了点云的结构特性,其权重是使用有限的马尔可夫链确定的。我们通过应用标准社区检测算法的诱导图来进一步压缩该图。使用从多种撕裂中借来的技术,我们根据它们的模块性修剪和恢复诱导图中的边缘,以总结数据的形状。我们从经验上证明了我们的技术如何捕获真实和合成数据集的结构特征。此外,我们使用T-SNE,UMAP,大节等各种滤波器功能将我们的方法与映射器进行比较,并表明我们的算法尺度与数百万个数据点,同时保留了数据可视化质量。
We present ShapeVis, a scalable visualization technique for point cloud data inspired from topological data analysis. Our method captures the underlying geometric and topological structure of the data in a compressed graphical representation. Much success has been reported by the data visualization technique Mapper, that discreetly approximates the Reeb graph of a filter function on the data. However, when using standard dimensionality reduction algorithms as the filter function, Mapper suffers from considerable computational cost. This makes it difficult to scale to high-dimensional data. Our proposed technique relies on finding a subset of points called landmarks along the data manifold to construct a weighted witness-graph over it. This graph captures the structural characteristics of the point cloud, and its weights are determined using a Finite Markov Chain. We further compress this graph by applying induced maps from standard community detection algorithms. Using techniques borrowed from manifold tearing, we prune and reinstate edges in the induced graph based on their modularity to summarize the shape of data. We empirically demonstrate how our technique captures the structural characteristics of real and synthetic data sets. Further, we compare our approach with Mapper using various filter functions like t-SNE, UMAP, LargeVis and show that our algorithm scales to millions of data points while preserving the quality of data visualization.