论文标题
深神经网络的通用近似定理,用于表达概率分布
A Universal Approximation Theorem of Deep Neural Networks for Expressing Probability Distributions
论文作者
论文摘要
本文研究了深神经网络的通用近似特性,用于表示概率分布。给定一个目标分布$π$和一个源分布$ p_z $均在$ \ mathbb {r}^d $上定义,我们在某些假设下证明存在深层神经网络$ g:\ sathbb {r}^d \ rightArrow \ rightArrow \ rightbb \ mathbb {r} $ with relu Activation $ re peaster_ n pers pepl-nmem c) $ p_z $下的地图$ \ \ nabla g $任意接近目标度量$π$。接近度是通过概率分布之间的三类积分概率指标来衡量的:$ 1 $ -WASSERSTEIN距离,最大平均距离(MMD)和内核化的Stein差异(KSD)。我们证明了深神经网络的大小(宽度和深度)的上限,该尺寸$ d $以及相对于三个差异的近似错误$ \ varepsilon $。特别是,当$ 1 $ -Wasserstein距离用作差异时,神经网络的大小可以在$ d $中呈指数增长,而对于MMD和KSD,神经网络的大小都仅取决于$ d $,最多最多地取决于$ d $。我们的证明依赖于上述差异和半分化最佳运输条件下经验措施的收敛估计。
This paper studies the universal approximation property of deep neural networks for representing probability distributions. Given a target distribution $π$ and a source distribution $p_z$ both defined on $\mathbb{R}^d$, we prove under some assumptions that there exists a deep neural network $g:\mathbb{R}^d\rightarrow \mathbb{R}$ with ReLU activation such that the push-forward measure $(\nabla g)_\# p_z$ of $p_z$ under the map $\nabla g$ is arbitrarily close to the target measure $π$. The closeness are measured by three classes of integral probability metrics between probability distributions: $1$-Wasserstein distance, maximum mean distance (MMD) and kernelized Stein discrepancy (KSD). We prove upper bounds for the size (width and depth) of the deep neural network in terms of the dimension $d$ and the approximation error $\varepsilon$ with respect to the three discrepancies. In particular, the size of neural network can grow exponentially in $d$ when $1$-Wasserstein distance is used as the discrepancy, whereas for both MMD and KSD the size of neural network only depends on $d$ at most polynomially. Our proof relies on convergence estimates of empirical measures under aforementioned discrepancies and semi-discrete optimal transport.