论文标题

仔细研究神经网络的近似功能

A closer look at the approximation capabilities of neural networks

论文作者

Chong, Kai Fong Ernest

论文摘要

通用近似定理在其最通用的版本之一中说,如果我们仅考虑连续激活函数$σ$,那么具有一个隐藏层的标准馈电神经网络就可以将任何连续的多变量函数$ f $近似于任何给定的近似近似threshold $ \ varepsilon $,并且只有$ f $σ$,则可以将任何连续的多变量函数$ f $。在本文中,我们给出了定理的直接代数证明。此外,我们将明确量化近似所需的隐藏单位数量。具体而言,如果$ x \ subseteq \ mathbb {r}^n $是紧凑的,则具有$ n $输入单位,$ m $输出单位和带有$ \ binom {n+d} {d} {d} $ hiddiT单位(独立于$ \ varepsilly yountial of polotiations $ yimptiations)的神经网络,$ \ binom {n+d} {n+d} {n+d} {d} \ Mathbb {r}^m $,其总数最多是其$ m $坐标函数的最多$ d $。在一般情况下,$ f $是任何连续功能,我们表明存在一些$ n \ in \ mathcal {o}(\ varepsilon^{ - n})$(独立于$ m $),因此$ n $隐藏的单位足以容纳大约$ f $。我们还表明,即使在重量上施加的看似强的条件下,这种均匀的近似特性(UAP)仍然存在。我们重点介绍了几个后果:(i)对于任何$δ> 0 $,如果我们限制了最后一层的所有非偏置权重$ w $,则UAP仍然可以满足$ | w | <δ$。 (ii)存在一些$λ> 0 $(仅取决于$ f $和$σ$),因此,如果我们限制第一层中所有非偏差权重$ w $以满足$ | w |>λ$,则UAP仍然存在。 (iii)如果第一层中的非偏置权重为\ emph {固定}并从合适的范围内随机选择,则UAP的概率为$ 1 $。

The universal approximation theorem, in one of its most general versions, says that if we consider only continuous activation functions $σ$, then a standard feedforward neural network with one hidden layer is able to approximate any continuous multivariate function $f$ to any given approximation threshold $\varepsilon$, if and only if $σ$ is non-polynomial. In this paper, we give a direct algebraic proof of the theorem. Furthermore we shall explicitly quantify the number of hidden units required for approximation. Specifically, if $X\subseteq \mathbb{R}^n$ is compact, then a neural network with $n$ input units, $m$ output units, and a single hidden layer with $\binom{n+d}{d}$ hidden units (independent of $m$ and $\varepsilon$), can uniformly approximate any polynomial function $f:X \to \mathbb{R}^m$ whose total degree is at most $d$ for each of its $m$ coordinate functions. In the general case that $f$ is any continuous function, we show there exists some $N\in \mathcal{O}(\varepsilon^{-n})$ (independent of $m$), such that $N$ hidden units would suffice to approximate $f$. We also show that this uniform approximation property (UAP) still holds even under seemingly strong conditions imposed on the weights. We highlight several consequences: (i) For any $δ> 0$, the UAP still holds if we restrict all non-bias weights $w$ in the last layer to satisfy $|w| < δ$. (ii) There exists some $λ>0$ (depending only on $f$ and $σ$), such that the UAP still holds if we restrict all non-bias weights $w$ in the first layer to satisfy $|w|>λ$. (iii) If the non-bias weights in the first layer are \emph{fixed} and randomly chosen from a suitable range, then the UAP holds with probability $1$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源