论文标题

仿射对称性和神经网络可识别性

Affine symmetries and neural network identifiability

论文作者

Vlačić, Verner, Bölcskei, Helmut

论文摘要

我们解决了以下神经网络可识别性的问题:假设我们给出了一个函数$ f:\ mathbb {r}^m \ to \ m马理{r}^n $和一个非线性$ρ$。我们可以就$ρ$引起$ f $的所有馈送神经网络的架构,权重和偏见吗?关于该主题的现有文献表明,答案应该是肯定的,只要我们只关心寻找满足某些“通用条件”的网络。此外,确定的网络通过非线性的对称性相互关联。例如,$ \ tanh $函数很奇怪,因此,换神经元的传入和外向权重的标志不会改变网络的输出映射。然而,迄今已知的结果适用于单层网络,或者用于满足特定结构假设(例如完全连通性)以及特定非线性的网络。为了在更大的一般性中回答可识别性问题,我们考虑具有潜在复杂仿射对称性的任意非线性,我们表明可以使用对称性来找到一组丰富的网络,从而产生相同的函数$ f $。实际上,以这种方式获得的集合是详尽的(即,它包含所有网络产生$ f $),除非存在网络$ \ nathcal {a} $“没有内部对称性”,从而产生了相同的零函数。因此,可以将此结果解释为线性操作员的排列定理的类似物。此外,我们还展示了一类“ $ \ tanh $ type”非线性(包括tanh函数本身),而这种网络$ \ mathcal {a} $不存在,从而解决了这些非线性的可识别性问题。最后,我们表明该类包含具有任意复杂对称性的非线性。

We address the following question of neural network identifiability: Suppose we are given a function $f:\mathbb{R}^m\to\mathbb{R}^n$ and a nonlinearity $ρ$. Can we specify the architecture, weights, and biases of all feed-forward neural networks with respect to $ρ$ giving rise to $f$? Existing literature on the subject suggests that the answer should be yes, provided we are only concerned with finding networks that satisfy certain "genericity conditions". Moreover, the identified networks are mutually related by symmetries of the nonlinearity. For instance, the $\tanh$ function is odd, and so flipping the signs of the incoming and outgoing weights of a neuron does not change the output map of the network. The results known hitherto, however, apply either to single-layer networks, or to networks satisfying specific structural assumptions (such as full connectivity), as well as to specific nonlinearities. In an effort to answer the identifiability question in greater generality, we consider arbitrary nonlinearities with potentially complicated affine symmetries, and we show that the symmetries can be used to find a rich set of networks giving rise to the same function $f$. The set obtained in this manner is, in fact, exhaustive (i.e., it contains all networks giving rise to $f$) unless there exists a network $\mathcal{A}$ "with no internal symmetries" giving rise to the identically zero function. This result can thus be interpreted as an analog of the rank-nullity theorem for linear operators. We furthermore exhibit a class of "$\tanh$-type" nonlinearities (including the tanh function itself) for which such a network $\mathcal{A}$ does not exist, thereby solving the identifiability question for these nonlinearities in full generality. Finally, we show that this class contains nonlinearities with arbitrarily complicated symmetries.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源