论文标题
基于密度估计的新索引用于聚类评估
A New Index for Clustering Evaluation Based on Density Estimation
论文作者
论文摘要
引入了用于集群内部评估的新索引。该索引定义为两个子指标的混合物。第一个子指数$ i_a $称为模棱两可的索引;第二个子指数$ i_s $称为相似性索引。两个子指数的计算基于对数据分区的每个群集的密度估计。进行了一项实验以测试新指数的性能,并与其他六个内部聚类评估指数 - Calinski-Harabasz指数,Silhouette系数,Davies-Bouldin Index,CDBW,DBCV和Viasckde进行了比较。结果表明,新指数可显着改善其他内部聚类评估指数。
A new index for internal evaluation of clustering is introduced. The index is defined as a mixture of two sub-indices. The first sub-index $ I_a $ is called the Ambiguous Index; the second sub-index $ I_s $ is called the Similarity Index. Calculation of the two sub-indices is based on density estimation to each cluster of a partition of the data. An experiment is conducted to test the performance of the new index, and compared with six other internal clustering evaluation indices -- Calinski-Harabasz index, Silhouette coefficient, Davies-Bouldin index, CDbw, DBCV, and VIASCKDE, on a set of 145 datasets. The result shows the new index significantly improves other internal clustering evaluation indices.