固定内存预算上的深层合奏：一个宽的网络或几个较薄的网络？

论文标题

固定内存预算上的深层合奏：一个宽的网络或几个较薄的网络？

Deep Ensembles on a Fixed Memory Budget: One Wide Network or Several Thinner Ones?

论文作者

Chirkova, Nadezhda, Lobacheva, Ekaterina, Vetrov, Dmitry

论文摘要

现代深度学习的普遍接受的观点之一是，增加参数的数量通常会导致质量更高。增加参数数量的两种最简单方法是增加网络的大小，例如宽度，或训练深层合奏；两种方法都在实践中提高了性能。在这项工作中，我们考虑了固定的内存预算设置，并调查什么更有效：训练单个宽网络或执行内存拆分 - 训练几个较薄网络的合奏，具有相同的参数？我们发现，对于足够大的预算，与最佳内存分配相对应的集合中的网络数通常大于一个。有趣的是，这种效果适用于标准架构的常用尺寸。例如，比16个稀薄的宽带的合奏：80.6％和82.52％相应地，一部分环网28-10在CIFAR-100上的测试准确性明显差得多。我们称描述的效果为内存拆分优势，并表明它适用于各种数据集和模型架构。

One of the generally accepted views of modern deep learning is that increasing the number of parameters usually leads to better quality. The two easiest ways to increase the number of parameters is to increase the size of the network, e.g. width, or to train a deep ensemble; both approaches improve the performance in practice. In this work, we consider a fixed memory budget setting, and investigate, what is more effective: to train a single wide network, or to perform a memory split -- to train an ensemble of several thinner networks, with the same total number of parameters? We find that, for large enough budgets, the number of networks in the ensemble, corresponding to the optimal memory split, is usually larger than one. Interestingly, this effect holds for the commonly used sizes of the standard architectures. For example, one WideResNet-28-10 achieves significantly worse test accuracy on CIFAR-100 than an ensemble of sixteen thinner WideResNets: 80.6% and 82.52% correspondingly. We call the described effect the Memory Split Advantage and show that it holds for a variety of datasets and model architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题