速率失真理论和深度学习的相变

论文标题

速率失真理论和深度学习的相变

Phase Transitions in Rate Distortion Theory and Deep Learning

论文作者

Grohs, Philipp, Klotz, Andreas, Voigtlaender, Felix

论文摘要

费率失真理论与使用$ r $ bits的预算（如$ r \ to \ infty $）最佳地编码给定信号类$ \ MATHCAL {S} $。我们说，如果我们可以实现编码$ \ MATHCAL {S} $的$ \ MATHCAL {O}（r^{ - s}）$的错误，则可以按速率$ s $压缩$ \ mathcal {s} $，以$ \ mathcal {s} $压缩，我们说$ \ mathcal {s} $可以压缩$ \ mathcal {s}（r^{ - s}）$，用于编码$ s $ s;至上的压缩率表示为$ s^\ ast（\ Mathcal {s}）$。给定固定的编码方案，通常有$ \ mathcal {s} $的元素以高于$ s^\ ast（\ Mathcal {s}）$更高的速率压缩的元素。我们研究这组信号的大小。我们表明，对于某些“ nice”信号类$ \ mathcal {s} $，进行相变发生：我们在$ \ Mathcal {s}上构建概率度量$ \ MATHBB {p} $ $ \ Mathcal {o}（r^{ - s}）$ by $ \ mathcal {c} $ forms a $ \ mathbb {p} $ - null-set。特别是我们的结果适用于BESOV和Sobolev空间中的球，这些球嵌入了有限的Lipschitz域$ω$中的$ l^2（ω）$。作为应用程序，我们表明，使用深神经网络的功能近似值几个现有的清晰度结果通常很清晰。我们还可以根据\ Mathcal {s} $在\ Mathcal {s}中随机$ f \使用$ r $ bits中的$ \ varepsilon $在\ mathcal {s} $中的随机$ f \ in \ mathcal {s} $中提供定量和非反应范围。该结果应用于\ Mathcal {s} $大约表示$ f \ in \ Mathcal {s} $的问题，以在精度$ \ VAREPSILON $内部由（量化的）神经网络$ \ varepsilon $，该神经网络最多具有$ w $ nonzero的权重，并且由任意“学习”过程产生。我们表明，对于任何$ s> s^\ ast（\ Mathcal {s}）$，都有常数$ c，c $，无论我们如何选择“学习”过程，成功的可能性都会以$ \ min \ min \ big \ big \ big \ big \ {1,2^{1,2^{C \ cdot w \ lceil w \ lceil \ lceil \ log_2（forgog_2（forg） -c \ cdot \ varepsilon^{ - 1/s}}} \ big \} $。

Rate distortion theory is concerned with optimally encoding a given signal class $\mathcal{S}$ using a budget of $R$ bits, as $R\to\infty$. We say that $\mathcal{S}$ can be compressed at rate $s$ if we can achieve an error of $\mathcal{O}(R^{-s})$ for encoding $\mathcal{S}$; the supremal compression rate is denoted $s^\ast(\mathcal{S})$. Given a fixed coding scheme, there usually are elements of $\mathcal{S}$ that are compressed at a higher rate than $s^\ast(\mathcal{S})$ by the given coding scheme; we study the size of this set of signals. We show that for certain "nice" signal classes $\mathcal{S}$, a phase transition occurs: We construct a probability measure $\mathbb{P}$ on $\mathcal{S}$ such that for every coding scheme $\mathcal{C}$ and any $s >s^\ast(\mathcal{S})$, the set of signals encoded with error $\mathcal{O}(R^{-s})$ by $\mathcal{C}$ forms a $\mathbb{P}$-null-set. In particular our results apply to balls in Besov and Sobolev spaces that embed compactly into $L^2(Ω)$ for a bounded Lipschitz domain $Ω$. As an application, we show that several existing sharpness results concerning function approximation using deep neural networks are generically sharp. We also provide quantitative and non-asymptotic bounds on the probability that a random $f\in\mathcal{S}$ can be encoded to within accuracy $\varepsilon$ using $R$ bits. This result is applied to the problem of approximately representing $f\in\mathcal{S}$ to within accuracy $\varepsilon$ by a (quantized) neural network that is constrained to have at most $W$ nonzero weights and is generated by an arbitrary "learning" procedure. We show that for any $s >s^\ast(\mathcal{S})$ there are constants $c,C$ such that, no matter how we choose the "learning" procedure, the probability of success is bounded from above by $\min\big\{1,2^{C\cdot W\lceil\log_2(1+W)\rceil^2 -c\cdot\varepsilon^{-1/s}}\big\}$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题