高斯过程中稀疏变异推断的收敛性回归

论文标题

高斯过程中稀疏变异推断的收敛性回归

Convergence of Sparse Variational Inference in Gaussian Processes Regression

论文作者

Burt, David R., Rasmussen, Carl Edward, van der Wilk, Mark

论文摘要

高斯过程是贝叶斯建模中用途广泛且数学方便的先验的功能的分布。但是，由于用于精确推断的矩阵操作的立方（$ n $）成本，通常会阻碍具有大量观测值的数据，即$ n $。已经提出了许多解决方案，这些解决方案依赖于$ m \ ll n $诱导变量以$ \ MATHCAL {O}（NM^2）$的成本形成近似。虽然计算成本以$ n $的形式出现线性，但真正的复杂性取决于$ m $必须用$ n $扩展以确保近似的某些质量。在这项工作中，我们研究了上限和下限，以了解$ m $如何使用$ n $增长，以确保高质量近似。我们表明，对于具有$ M \ ll n $的高斯噪声回归模型，我们可以在近似模型和确切的后部任意后近距离之间进行KL差异。具体而言，对于流行的平方指数内核和$ d $二维的高斯分布式协变量，$ m = \ Mathcal {o}（（（\ log log n）^d）$足够，并且具有整体计算成本的方法，总体计算成本为$ \ \ \ \ \ \ \ \ \ \ \ \ \ \ n of}（n（n log n）^n）

Gaussian processes are distributions over functions that are versatile and mathematically convenient priors in Bayesian modelling. However, their use is often impeded for data with large numbers of observations, $N$, due to the cubic (in $N$) cost of matrix operations used in exact inference. Many solutions have been proposed that rely on $M \ll N$ inducing variables to form an approximation at a cost of $\mathcal{O}(NM^2)$. While the computational cost appears linear in $N$, the true complexity depends on how $M$ must scale with $N$ to ensure a certain quality of the approximation. In this work, we investigate upper and lower bounds on how $M$ needs to grow with $N$ to ensure high quality approximations. We show that we can make the KL-divergence between the approximate model and the exact posterior arbitrarily small for a Gaussian-noise regression model with $M\ll N$. Specifically, for the popular squared exponential kernel and $D$-dimensional Gaussian distributed covariates, $M=\mathcal{O}((\log N)^D)$ suffice and a method with an overall computational cost of $\mathcal{O}(N(\log N)^{2D}(\log\log N)^2)$ can be used to perform inference.

下载PDF全文

下载文献需遵守相关版权规定

论文标题