学习共享内核模型：共享内核EM算法

论文标题

学习共享内核模型：共享内核EM算法

Learning Shared Kernel Models: the Shared Kernel EM algorithm

论文作者

Pulford, Graham W.

论文摘要

预期最大化（EM）是一种无监督的学习方法，用于估计有限混合物分布的参数。它通过通过Baum的辅助函数$ Q $引入“隐藏”或“潜在”变量来起作用，该变量可以作为简单因素的产物表示联合数据的可能性。自引入变异下限（VLB）以来，EM的相关性有所提高：VLB仅通过潜在变量$ z $的PDF的熵而与Baum的辅助功能有所不同。我们首先使用来自多个目标跟踪领域的数据关联想法的标准EM算法进行重新启动，并使用$ k $值的标量数据关联假设而不是通常的二进制指标向量。然后将同样的方法应用于共享内核模型的鲜为人知但更通用的监督EM算法类型，该模型与概率径向基函数网络有关。我们解决了以前在该领域发表的推导中的许多缺点。特别是，我们给出了（i）完整数据可能性的理论上严格派生；（ii）在高斯共享内核模型的情况下，鲍姆的辅助函数（E-step）和（iii）最大化（m-step）。然后使用新颖的7段数字表示形式将随后的算法（称为共享内核EM（SKEM））应用于数字识别问题。使用不同数量的特征和不同的EM算法维度的算法的变体以平均准确性和平均值进行比较。提出了一个简化的分类器，该分类器将关节数据PDF分解为较低阶PDF的乘积，而不是变量的非重叠子集。还研究了不同数量的假定混合组件$ K $的效果。提供了用于数据生成和SKEM算法的高级源代码。

Expectation maximisation (EM) is an unsupervised learning method for estimating the parameters of a finite mixture distribution. It works by introducing "hidden" or "latent" variables via Baum's auxiliary function $Q$ that allow the joint data likelihood to be expressed as a product of simple factors. The relevance of EM has increased since the introduction of the variational lower bound (VLB): the VLB differs from Baum's auxiliary function only by the entropy of the PDF of the latent variables $Z$. We first present a rederivation of the standard EM algorithm using data association ideas from the field of multiple target tracking, using $K$-valued scalar data association hypotheses rather than the usual binary indicator vectors. The same method is then applied to a little known but much more general type of supervised EM algorithm for shared kernel models, related to probabilistic radial basis function networks. We address a number of shortcomings in the derivations that have been published previously in this area. In particular, we give theoretically rigorous derivations of (i) the complete data likelihood; (ii) Baum's auxiliary function (the E-step) and (iii) the maximisation (M-step) in the case of Gaussian shared kernel models. The subsequent algorithm, called shared kernel EM (SKEM), is then applied to a digit recognition problem using a novel 7-segment digit representation. Variants of the algorithm that use different numbers of features and different EM algorithm dimensions are compared in terms of mean accuracy and mean IoU. A simplified classifier is proposed that decomposes the joint data PDF as a product of lower order PDFs over non-overlapping subsets of variables. The effect of different numbers of assumed mixture components $K$ is also investigated. High-level source code for the data generation and SKEM algorithm is provided.

下载PDF全文

下载文献需遵守相关版权规定

论文标题