论文标题
关于通道能力在学习高斯混合模型中的作用
On the Role of Channel Capacity in Learning Gaussian Mixture Models
论文作者
论文摘要
本文研究了以$ \ mathbb {r}^d $使用球形协方差矩阵$σ^2 \ mathbf {i} $中的平衡高斯混合模型(gmm)学习的$ k $未知中心的样本复杂性。特别是,我们对以下问题感兴趣:最大噪声水平$σ^2 $是什么,对于样品的复杂性基本与从标记的测量值估算中心时相同?为此,我们将注意力限制在问题的贝叶斯公式中,其中中心均匀分布在球体上$ \ sqrt {d} \ Mathcal {s}^{d-1} $。我们的主要结果表征了确切的噪声阈值$σ^2 $以下,而GMM学习问题(在大系统中限制$ d,k \ to \ infty $)就像从标记的观察值中学习一样容易,而这要大得多。阈值发生在$ \ frac {\ log k} {d} = \ frac12 \ log \ left(1+ \ frac {1} {σ^2} \ right)$,这是添加性白色高斯噪声(AWGN)信道的容量。将$ k $中心的集合作为代码,可以将噪声阈值解释为最大的噪声水平,在AWGN通道上代码的错误概率很小。关于GMM学习问题的先前工作已将中心之间的最小距离确定为确定学习相应GMM的统计难度的关键参数。虽然我们的结果仅是针对中心均匀分布在球体上的GMM的,但他们暗示,也许是与中心星座相关的解码误差概率作为通道代码,这决定了学习相应的GMM的统计难度,而不仅仅是最小距离。
This paper studies the sample complexity of learning the $k$ unknown centers of a balanced Gaussian mixture model (GMM) in $\mathbb{R}^d$ with spherical covariance matrix $σ^2\mathbf{I}$. In particular, we are interested in the following question: what is the maximal noise level $σ^2$, for which the sample complexity is essentially the same as when estimating the centers from labeled measurements? To that end, we restrict attention to a Bayesian formulation of the problem, where the centers are uniformly distributed on the sphere $\sqrt{d}\mathcal{S}^{d-1}$. Our main results characterize the exact noise threshold $σ^2$ below which the GMM learning problem, in the large system limit $d,k\to\infty$, is as easy as learning from labeled observations, and above which it is substantially harder. The threshold occurs at $\frac{\log k}{d} = \frac12\log\left( 1+\frac{1}{σ^2} \right)$, which is the capacity of the additive white Gaussian noise (AWGN) channel. Thinking of the set of $k$ centers as a code, this noise threshold can be interpreted as the largest noise level for which the error probability of the code over the AWGN channel is small. Previous works on the GMM learning problem have identified the minimum distance between the centers as a key parameter in determining the statistical difficulty of learning the corresponding GMM. While our results are only proved for GMMs whose centers are uniformly distributed over the sphere, they hint that perhaps it is the decoding error probability associated with the center constellation as a channel code that determines the statistical difficulty of learning the corresponding GMM, rather than just the minimum distance.