论文标题
贝叶斯深神经网络超出无限宽度极限的统计力学框架
A statistical mechanics framework for Bayesian deep neural networks beyond the infinite-width limit
论文作者
论文摘要
尽管深度神经网络取得了实际的成功,但目前缺乏了解培训数据的知识,可以预测实际相关分数(例如测试精度)的全面理论框架。在无限宽度限制中出现了巨大的简化,其中每个隐藏层中的单位$ n_ \ ell $($ \ ell = 1,\ dots,l $,是$ l $,网络深度)远远超过了培训示例的数量$ p $。然而,这种理想化公然偏离了深度学习实践的现实。在这里,我们使用统计力学的工具集来克服这些局限性,并为完全连接的深神经体系结构提供了近似分区功能,该功能编码有关训练有素的模型的信息。该计算在“热力学限制”中保留,其中$ n_ \ ell $和$ p $都大,其比率$α__\ ell = p/n_ \ ell $是有限的。这一进步使我们能够获得(i)与有限$α_1$的一个隐藏层网络中与回归任务相关的概括错误的封闭公式; (ii)深层体系结构的分区函数的近似表达(通过“有效动作”取决于有限数量的“订单参数”'); (iii)以比例渐近限制与学生的$ t $过程之间的深神经网络之间的联系。
Despite the practical success of deep neural networks, a comprehensive theoretical framework that can predict practically relevant scores, such as the test accuracy, from knowledge of the training data is currently lacking. Huge simplifications arise in the infinite-width limit, where the number of units $N_\ell$ in each hidden layer ($\ell=1,\dots, L$, being $L$ the depth of the network) far exceeds the number $P$ of training examples. This idealisation, however, blatantly departs from the reality of deep learning practice. Here, we use the toolset of statistical mechanics to overcome these limitations and derive an approximate partition function for fully-connected deep neural architectures, which encodes information about the trained models. The computation holds in the ''thermodynamic limit'' where both $N_\ell$ and $P$ are large and their ratio $α_\ell = P/N_\ell$ is finite. This advance allows us to obtain (i) a closed formula for the generalisation error associated to a regression task in a one-hidden layer network with finite $α_1$; (ii) an approximate expression of the partition function for deep architectures (via an ''effective action'' that depends on a finite number of ''order parameters''); (iii) a link between deep neural networks in the proportional asymptotic limit and Student's $t$ processes.