论文标题
学习共享内核模型:共享内核EM算法
Learning Shared Kernel Models: the Shared Kernel EM algorithm
论文作者
论文摘要
预期最大化(EM)是一种无监督的学习方法,用于估计有限混合物分布的参数。它通过通过Baum的辅助函数$ Q $引入“隐藏”或“潜在”变量来起作用,该变量可以作为简单因素的产物表示联合数据的可能性。自引入变异下限(VLB)以来,EM的相关性有所提高:VLB仅通过潜在变量$ z $的PDF的熵而与Baum的辅助功能有所不同。我们首先使用来自多个目标跟踪领域的数据关联想法的标准EM算法进行重新启动,并使用$ k $值的标量数据关联假设而不是通常的二进制指标向量。然后将同样的方法应用于共享内核模型的鲜为人知但更通用的监督EM算法类型,该模型与概率径向基函数网络有关。我们解决了以前在该领域发表的推导中的许多缺点。特别是,我们给出了(i)完整数据可能性的理论上严格派生; (ii)在高斯共享内核模型的情况下,鲍姆的辅助函数(E-step)和(iii)最大化(m-step)。然后使用新颖的7段数字表示形式将随后的算法(称为共享内核EM(SKEM))应用于数字识别问题。使用不同数量的特征和不同的EM算法维度的算法的变体以平均准确性和平均值进行比较。提出了一个简化的分类器,该分类器将关节数据PDF分解为较低阶PDF的乘积,而不是变量的非重叠子集。还研究了不同数量的假定混合组件$ K $的效果。提供了用于数据生成和SKEM算法的高级源代码。
Expectation maximisation (EM) is an unsupervised learning method for estimating the parameters of a finite mixture distribution. It works by introducing "hidden" or "latent" variables via Baum's auxiliary function $Q$ that allow the joint data likelihood to be expressed as a product of simple factors. The relevance of EM has increased since the introduction of the variational lower bound (VLB): the VLB differs from Baum's auxiliary function only by the entropy of the PDF of the latent variables $Z$. We first present a rederivation of the standard EM algorithm using data association ideas from the field of multiple target tracking, using $K$-valued scalar data association hypotheses rather than the usual binary indicator vectors. The same method is then applied to a little known but much more general type of supervised EM algorithm for shared kernel models, related to probabilistic radial basis function networks. We address a number of shortcomings in the derivations that have been published previously in this area. In particular, we give theoretically rigorous derivations of (i) the complete data likelihood; (ii) Baum's auxiliary function (the E-step) and (iii) the maximisation (M-step) in the case of Gaussian shared kernel models. The subsequent algorithm, called shared kernel EM (SKEM), is then applied to a digit recognition problem using a novel 7-segment digit representation. Variants of the algorithm that use different numbers of features and different EM algorithm dimensions are compared in terms of mean accuracy and mean IoU. A simplified classifier is proposed that decomposes the joint data PDF as a product of lower order PDFs over non-overlapping subsets of variables. The effect of different numbers of assumed mixture components $K$ is also investigated. High-level source code for the data generation and SKEM algorithm is provided.