HYDRA：保留模型蒸馏的合奏多样性

论文标题

HYDRA：保留模型蒸馏的合奏多样性

Hydra: Preserving Ensemble Diversity for Model Distillation

论文作者

Tran, Linh, Veeling, Bastiaan S., Roth, Kevin, Swiatkowski, Jakub, Dillon, Joshua V., Snoek, Jasper, Mandt, Stephan, Salimans, Tim, Nowozin, Sebastian, Jenatton, Rodolphe

论文摘要

在经验上，模型的集合已被证明可以提高预测性能并产生不确定性的强大度量。但是，它们在计算和内存中很昂贵。因此，最近的研究集中在将合奏提取到单个紧凑的模型中，从而减少了整体的计算和记忆负担，同时试图保留其预测行为。大多数现有的蒸馏配方通过捕获其平均预测来总结合奏。结果，丢失了每个成员的集合预测的多样性。因此，蒸馏模型无法提供与原始合奏相当的不确定性的度量。为了更忠实地保留合奏的多样性，我们提出了一种基于单个多头神经网络的蒸馏方法，我们称之为Hydra。共享的身体网络学习了一个联合特征表示，使每个头部能够捕获每个集合成员的预测行为。我们证明，随着参数计数的略有增加，Hydra在分类和回归设置上提高了蒸馏性能，同时捕获了原始集合的不确定性行为，均在域内和分发任务上。

Ensembles of models have been empirically shown to improve predictive performance and to yield robust measures of uncertainty. However, they are expensive in computation and memory. Therefore, recent research has focused on distilling ensembles into a single compact model, reducing the computational and memory burden of the ensemble while trying to preserve its predictive behavior. Most existing distillation formulations summarize the ensemble by capturing its average predictions. As a result, the diversity of the ensemble predictions, stemming from each member, is lost. Thus, the distilled model cannot provide a measure of uncertainty comparable to that of the original ensemble. To retain more faithfully the diversity of the ensemble, we propose a distillation method based on a single multi-headed neural network, which we refer to as Hydra. The shared body network learns a joint feature representation that enables each head to capture the predictive behavior of each ensemble member. We demonstrate that with a slight increase in parameter count, Hydra improves distillation performance on classification and regression settings while capturing the uncertainty behavior of the original ensemble over both in-domain and out-of-distribution tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题