论文标题
贝叶斯模型选择的子采样方法
A subsampling approach for Bayesian model selection
论文作者
论文摘要
使用拉普拉斯近似值来计算贝叶斯版本的广义线性模型(GLM)中的边际可能性是普遍的做法。然后在不同的搜索算法中使用边际可能性与模型先验结合使用,以计算模型和单个协变量的后边缘概率。这允许执行贝叶斯模型选择和平均模型。对于大型样本量,即使是拉普拉斯近似在计算上都充满挑战,因为所涉及的优化程序需要在多个迭代中评估整个数据的可能性。结果,对于大型数据集,该算法是不可扩展的。为了解决这个问题,我们建议使用流行的批处理随机梯度下降(BSGD)算法的版本来估计通过从数据进行下采样的GLM的边际可能性。我们进一步将基于贝叶斯模型选择的基于马尔可夫链蒙特卡洛(MCMC)的方法结合在一起,并提供了有关估计收敛性的一些理论结果。最后,我们报告了实验的结果,说明了所提出的算法的性能。
It is common practice to use Laplace approximations to compute marginal likelihoods in Bayesian versions of generalised linear models (GLM). Marginal likelihoods combined with model priors are then used in different search algorithms to compute the posterior marginal probabilities of models and individual covariates. This allows performing Bayesian model selection and model averaging. For large sample sizes, even the Laplace approximation becomes computationally challenging because the optimisation routine involved needs to evaluate the likelihood on the full set of data in multiple iterations. As a consequence, the algorithm is not scalable for large datasets. To address this problem, we suggest using a version of a popular batch stochastic gradient descent (BSGD) algorithm for estimating the marginal likelihood of a GLM by subsampling from the data. We further combine the algorithm with Markov chain Monte Carlo (MCMC) based methods for Bayesian model selection and provide some theoretical results on the convergence of the estimates. Finally, we report results from experiments illustrating the performance of the proposed algorithm.