论文标题
深层合奏起作用,但是它们有必要吗?
Deep Ensembles Work, But Are They Necessary?
论文作者
论文摘要
结合神经网络是提高准确性的有效方法,并且通常可以匹配单个较大模型的性能。该观察结果提出了一个自然的问题:鉴于深层合奏和具有相似精度的单个神经网络之间的选择,一个比另一个更好吗?最近的工作表明,深层合奏可能会在预测能力之外提供独特的好处:即,对数据集转移的不确定性量化和鲁棒性。在这项工作中,我们证明了这些所谓的好处的局限性,并表明单个(但更大的)神经网络可以复制这些素质。首先,我们表明,通过任何度量,整体多样性都没有有意义地促进整体对分布(OOD)数据的不确定性量化,而是与单个较大模型的相对改善高度相关。其次,我们表明,合奏所提供的OOD性能是由它们的分布(IND)性能强烈决定的,并且 - 从这个意义上讲,并不表示任何“有效的鲁棒性”。虽然深层集合是实现预测能力,不确定性量化和鲁棒性改进的一种实用方法,但我们的结果表明,这些改进可以通过(较大)单个模型来复制。
Ensembling neural networks is an effective way to increase accuracy, and can often match the performance of individual larger models. This observation poses a natural question: given the choice between a deep ensemble and a single neural network with similar accuracy, is one preferable over the other? Recent work suggests that deep ensembles may offer distinct benefits beyond predictive power: namely, uncertainty quantification and robustness to dataset shift. In this work, we demonstrate limitations to these purported benefits, and show that a single (but larger) neural network can replicate these qualities. First, we show that ensemble diversity, by any metric, does not meaningfully contribute to an ensemble's uncertainty quantification on out-of-distribution (OOD) data, but is instead highly correlated with the relative improvement of a single larger model. Second, we show that the OOD performance afforded by ensembles is strongly determined by their in-distribution (InD) performance, and -- in this sense -- is not indicative of any "effective robustness". While deep ensembles are a practical way to achieve improvements to predictive power, uncertainty quantification, and robustness, our results show that these improvements can be replicated by a (larger) single model.