在公制DBSCAN上，尺寸较低

论文标题

在公制DBSCAN上，尺寸较低

On Metric DBSCAN with Low Doubling Dimension

论文作者

Ding, Hu, Yang, Fan

论文摘要

基于密度的聚类方法{\ em密度基于噪声（dbscan）}的应用程序是一种流行的识别方法，并且从许多不同领域受到了极大的关注。原始DBSCAN的一个主要问题是，时间复杂性可能与二次的时间一样大。现有的大多数DBSCAN算法都集中在开发有效的指数结构上，以加快低维欧几里得空间中的过程。但是，据我们所知，在高维欧几里得空间或一般度量空间中对DBSCAN的研究仍然非常有限。在本文中，我们考虑了嵌入式（不包括异常值）具有较低的尺寸的假设，我们考虑了公制的DBSCAN问题。我们应用了一种新颖的随机$ K $中心聚类的想法来降低范围查询的复杂性，这是整个DBSCAN过程中最耗时的一步。我们提出的算法不需要构建任何复杂的数据结构，并且在实践中易于实现。实验结果表明，我们的算法在运行时间方面可以显着优于现有的DBSCAN算法。

The density based clustering method {\em Density-Based Spatial Clustering of Applications with Noise (DBSCAN)} is a popular method for outlier recognition and has received tremendous attention from many different areas. A major issue of the original DBSCAN is that the time complexity could be as large as quadratic. Most of existing DBSCAN algorithms focus on developing efficient index structures to speed up the procedure in low-dimensional Euclidean space. However, the research of DBSCAN in high-dimensional Euclidean space or general metric space is still quite limited, to the best of our knowledge. In this paper, we consider the metric DBSCAN problem under the assumption that the inliers (excluding the outliers) have a low doubling dimension. We apply a novel randomized $k$-center clustering idea to reduce the complexity of range query, which is the most time consuming step in the whole DBSCAN procedure. Our proposed algorithms do not need to build any complicated data structures and are easy to be implemented in practice. The experimental results show that our algorithms can significantly outperform the existing DBSCAN algorithms in terms of running time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题