论文标题
用于概率集群的广义贝叶斯框架
A generalized Bayes framework for probabilistic clustering
论文作者
论文摘要
基于损失的聚类方法,例如K-均值及其变体,是用于查找数据组的标准工具。但是,缺乏估计集群中不确定性的量化是一个劣势。基于混合模型的基于模型的聚类提供了一种替代方案,但是这种方法面临计算问题和对内核选择的敏感性。本文提出了一个通用的贝叶斯框架,该框架通过使用Gibbs后代在这两个范式之间桥接。在进行贝叶斯更新时,对数的可能性被聚集的损失功能取代,从而导致了丰富的聚类方法。 Gibbs后验代表了贝叶斯信念的连贯更新,而无需指定数据的可能性,并且可用于表征聚类中的不确定性。我们考虑基于Bregman的差异和成对相似性的损失,并开发出有效的确定性算法来进行点估计,以及用于不确定性定量的采样算法。几种现有的聚类算法,包括K-均值,可以解释为我们框架下的通用贝叶斯估计量,因此我们为这些方法提供了一种不确定性量化的方法。
Loss-based clustering methods, such as k-means and its variants, are standard tools for finding groups in data. However, the lack of quantification of uncertainty in the estimated clusters is a disadvantage. Model-based clustering based on mixture models provides an alternative, but such methods face computational problems and large sensitivity to the choice of kernel. This article proposes a generalized Bayes framework that bridges between these two paradigms through the use of Gibbs posteriors. In conducting Bayesian updating, the log likelihood is replaced by a loss function for clustering, leading to a rich family of clustering methods. The Gibbs posterior represents a coherent updating of Bayesian beliefs without needing to specify a likelihood for the data, and can be used for characterizing uncertainty in clustering. We consider losses based on Bregman divergence and pairwise similarities, and develop efficient deterministic algorithms for point estimation along with sampling algorithms for uncertainty quantification. Several existing clustering algorithms, including k-means, can be interpreted as generalized Bayes estimators under our framework, and hence we provide a method of uncertainty quantification for these approaches.