论文标题
Bregman Power K-Means用于聚类指数家庭数据
Bregman Power k-Means for Clustering Exponential Family Data
论文作者
论文摘要
基于中心的聚类算法的最新进展通过隐式退火来打击贫穷的本地最小值,并使用一系列普遍的手段来打击。这些方法是劳埃德(Lloyd)著名的$ k $ -MEANS算法的变体,最适合于球形簇,例如由高斯数据引起的簇。在本文中,我们将这些算法的进步桥接为在布雷格曼(Bregman)差异下的硬聚类上进行的经典工作,这些工作享有指数级家庭分布的培训,因此非常适合由数据生成机制的广度引起的聚类对象。布雷格曼(Bregman)分歧的优雅特性使我们能够以简单且透明的算法维护封闭的表单更新,此外,还引发了新的理论论点,以建立有限的样本范围,以放松在现有的艺术状态下做出的有限支持假设。此外,我们考虑了对模拟实验的彻底经验分析和降雨数据的案例研究,发现所提出的方法在各种非高斯数据设置中都优于现有的同行方法。
Recent progress in center-based clustering algorithms combats poor local minima by implicit annealing, using a family of generalized means. These methods are variations of Lloyd's celebrated $k$-means algorithm, and are most appropriate for spherical clusters such as those arising from Gaussian data. In this paper, we bridge these algorithmic advances to classical work on hard clustering under Bregman divergences, which enjoy a bijection to exponential family distributions and are thus well-suited for clustering objects arising from a breadth of data generating mechanisms. The elegant properties of Bregman divergences allow us to maintain closed form updates in a simple and transparent algorithm, and moreover lead to new theoretical arguments for establishing finite sample bounds that relax the bounded support assumption made in the existing state of the art. Additionally, we consider thorough empirical analyses on simulated experiments and a case study on rainfall data, finding that the proposed method outperforms existing peer methods in a variety of non-Gaussian data settings.