重新访问结构化辍学

论文标题

重新访问结构化辍学

Revisiting Structured Dropout

论文作者

Zhao, Yiren, Dada, Oluwatomisin, Gao, Xitong, Mullins, Robert D

论文摘要

大型神经网络通常被过度参数化，容易过度拟合，是一种广泛使用的正规化技术，可打击过度拟合并改善模型的概括。但是，非结构化辍学并不总是对特定网络体系结构有效，这导致形成了多种结构化辍学方法以改善模型性能，有时会减少推理所需的计算资源。在这项工作中，我们重新访问结构化辍学，将自然语言处理的不同辍学方法和计算机视觉任务进行比较，用于多个最先进的网络。此外，我们设计了一种结构化辍学方法的方法，我们称\ textbf {\ emph {probDropblock}}，该方法从特征映射中删除连续块，并具有归一化特征显着性值给出的概率。我们发现，通过简单的调度策略，与基线和其他辍学方法相比，提出的结构化辍学方法始终如一地改善了模型性能。特别是，我们向\ textbf {\ emph {probDropblock}}}}将MNLI上的Roberta Finetuning提高了$ 0.22 \％$，并且对Imagenet上的Resnet50的培训提高了$ 0.28 \％$。

Large neural networks are often overparameterised and prone to overfitting, Dropout is a widely used regularization technique to combat overfitting and improve model generalization. However, unstructured Dropout is not always effective for specific network architectures and this has led to the formation of multiple structured Dropout approaches to improve model performance and, sometimes, reduce the computational resources required for inference. In this work, we revisit structured Dropout comparing different Dropout approaches to natural language processing and computer vision tasks for multiple state-of-the-art networks. Additionally, we devise an approach to structured Dropout we call \textbf{\emph{ProbDropBlock}} which drops contiguous blocks from feature maps with a probability given by the normalized feature salience values. We find that with a simple scheduling strategy the proposed approach to structured Dropout consistently improved model performance compared to baselines and other Dropout approaches on a diverse range of tasks and models. In particular, we show \textbf{\emph{ProbDropBlock}} improves RoBERTa finetuning on MNLI by $0.22\%$, and training of ResNet50 on ImageNet by $0.28\%$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题