深度学习训练的终末阶段神经崩溃的率

论文标题

深度学习训练的终末阶段神经崩溃的率

Prevalence of Neural Collapse during the terminal phase of deep learning training

论文作者

Papyan, Vardan, Han, X. Y., Donoho, David L.

论文摘要

训练分类的现代实践涉及训练的终端阶段（TPT），该阶段始于训练错误首先消失的时期。在TPT期间，训练误差在将训练损失推向零时有效地保持零。 Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasive inductive bias we call Neural Collapse, involving four deeply interconnected phenomena: (NC1) Cross-example within-class variability of last-layer training activations collapses to zero, as the individual activations themselves collapse to their class-means; （NC2）阶级均值崩溃到单纯性紧密框架（ETF）的顶点；（NC3）要重新缩放，最后一层分类器崩溃了，或者换句话说，即单纯式ETF，即进行自偶式配置；（NC4）对于给定的激活，分类器的决定崩溃了，以简单地选择哪个班级最接近火车班级，即最近的班级中心（NCC）决策规则。 TPT引起的对称和非常简单的几何形状赋予了重要的好处，包括更好的概括性能，更好的鲁棒性和更好的解释性。

Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT), which begins at the epoch where training error first vanishes; During TPT, the training error stays effectively zero while training loss is pushed towards zero. Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasive inductive bias we call Neural Collapse, involving four deeply interconnected phenomena: (NC1) Cross-example within-class variability of last-layer training activations collapses to zero, as the individual activations themselves collapse to their class-means; (NC2) The class-means collapse to the vertices of a Simplex Equiangular Tight Frame (ETF); (NC3) Up to rescaling, the last-layer classifiers collapse to the class-means, or in other words to the Simplex ETF, i.e. to a self-dual configuration; (NC4) For a given activation, the classifier's decision collapses to simply choosing whichever class has the closest train class-mean, i.e. the Nearest Class Center (NCC) decision rule. The symmetric and very simple geometry induced by the TPT confers important benefits, including better generalization performance, better robustness, and better interpretability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题