论文标题
通过多参数持久性稳定且一致的基于密度的聚类
Stable and consistent density-based clustering via multiparameter persistence
论文作者
论文摘要
我们考虑了拓扑数据分析的度-RIPS构造,该分析提供了密度敏感的多参数分层聚类算法。我们使用通信交换距离分析了其对输入数据扰动的稳定性,这是我们引入的分层聚类的度量。采用某些单参数式插入度-RIPS可以恢复以密度聚类的著名方法,但我们表明这些方法是不稳定的。但是,我们证明了作为多参数对象的度核算是稳定的,并且我们提出了一种替代方法,用于采用片段rips,从而产生具有更好稳定性特性的单参数层次群集算法。我们证明,使用对应交换距离,该算法是一致的。我们提供了一种从一个参数分层聚类中提取单个聚类的算法,该聚类相对于对应性交流距离稳定。而且,我们将这些方法集成到基于密度的聚类的管道中,我们称之为持久性。从多参数持续的同源性调整工具,我们提出了可视化工具,以指导管道的所有参数的选择。我们在基准数据集上证明了持久性,表明它标识了数据中的多尺度群集结构。
We consider the degree-Rips construction from topological data analysis, which provides a density-sensitive, multiparameter hierarchical clustering algorithm. We analyze its stability to perturbations of the input data using the correspondence-interleaving distance, a metric for hierarchical clusterings that we introduce. Taking certain one-parameter slices of degree-Rips recovers well-known methods for density-based clustering, but we show that these methods are unstable. However, we prove that degree-Rips, as a multiparameter object, is stable, and we propose an alternative approach for taking slices of degree-Rips, which yields a one-parameter hierarchical clustering algorithm with better stability properties. We prove that this algorithm is consistent, using the correspondence-interleaving distance. We provide an algorithm for extracting a single clustering from one-parameter hierarchical clusterings, which is stable with respect to the correspondence-interleaving distance. And, we integrate these methods into a pipeline for density-based clustering, which we call Persistable. Adapting tools from multiparameter persistent homology, we propose visualization tools that guide the selection of all parameters of the pipeline. We demonstrate Persistable on benchmark data sets, showing that it identifies multi-scale cluster structure in data.