论文标题

通过直接优化Medoid Silhouette聚类

Clustering by Direct Optimization of the Medoid Silhouette

论文作者

Lenssen, Lars, Schubert, Erich

论文摘要

聚类结果的评估很困难,高度依赖于评估的数据集和情人的观点。有许多不同的聚类质量度量,试图提供一般度量以验证聚类结果。一个非常流行的措施是轮廓。我们讨论轮廓的有效基于MEDOI的变体,对其性质进行理论分析,并为直接优化提供两个快速版本。我们将原始轮廓中的想法与著名的PAM算法及其最新的ForpAmpam相结合。其中一个版本保证了与原始变体相等的结果,并提供了$ O(k^2)$的运行加速。在对带有30000个样品和$ K $ = 100的实际数据实验中,我们观察到10464 $ \ times $速度与原始的Pammedsil算法相比。

The evaluation of clustering results is difficult, highly dependent on the evaluated data set and the perspective of the beholder. There are many different clustering quality measures, which try to provide a general measure to validate clustering results. A very popular measure is the Silhouette. We discuss the efficient medoid-based variant of the Silhouette, perform a theoretical analysis of its properties, and provide two fast versions for the direct optimization. We combine ideas from the original Silhouette with the well-known PAM algorithm and its latest improvements FasterPAM. One of the versions guarantees equal results to the original variant and provides a run speedup of $O(k^2)$. In experiments on real data with 30000 samples and $k$=100, we observed a 10464$\times$ speedup compared to the original PAMMEDSIL algorithm.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源