k-sums：k均值的另一面

论文标题

k-sums：k均值的另一面

k-sums: another side of k-means

论文作者

Zhao, Wan-Lei, Chen, Run-Qing, Ye, Hui, Ngo, Chong-Wah

论文摘要

在本文中，重新审视了数十年的聚类方法K-均值。 K均值的原始失真最小化模型通过纯净的随机最小化程序来解决。在迭代的每个步骤中，一个样本暂时从一个群集到另一个群集重新分配。只要重新分配允许样品更靠近新的质心，它就会移动到另一个群集。该优化过程比K-均值及其许多变体更快地收敛到更好的局部最小值。对K-均值循环的这种基本修改导致重新定义K-均值变体。此外，提出了一种新的目标函数，以最大程度地减少群集内成对距离的总和。我们表明，可以在相同的随机优化程序下解决。这种最小化的过程构建在两个最小化模型上的基础上，在不同的设置和不同的数据集上，k-均值及其变体大大优于其变体。

In this paper, the decades-old clustering method k-means is revisited. The original distortion minimization model of k-means is addressed by a pure stochastic minimization procedure. In each step of the iteration, one sample is tentatively reallocated from one cluster to another. It is moved to another cluster as long as the reallocation allows the sample to be closer to the new centroid. This optimization procedure converges faster to a better local minimum over k-means and many of its variants. This fundamental modification over the k-means loop leads to the redefinition of a family of k-means variants. Moreover, a new target function that minimizes the summation of pairwise distances within clusters is presented. We show that it could be solved under the same stochastic optimization procedure. This minimization procedure built upon two minimization models outperforms k-means and its variants considerably with different settings and on different datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题