Meterperturb：可转移的正规器，用于异构任务和架构

论文标题

Meterperturb：可转移的正规器，用于异构任务和架构

MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures

论文作者

Ryu, Jeongun, Shin, Jaewoong, Lee, Hae Beom, Hwang, Sung Ju

论文摘要

正则化和转移学习是两种流行的技术，可以增强对看不见数据的概括，这是机器学习的基本问题。正则化技术是用途广泛的，因为它们是任务和架构 - 不合时式，但它们没有利用大量可用数据。转移学习方法学会将知识从一个领域转移到另一个领域，但可能不会跨任务和体系结构概括，并且可能会引入新的培训成本以适应目标任务。为了弥合两者之间的差距，我们提出了一个可转移的扰动式甲图扰动，该磁磁性可用于改善对看不见数据的概括性能。 Meterperturb被实现为基于集合的轻量级网络，该网络对输入的大小和顺序不可知，该网络在整个层上共享。然后，我们提出一个元学习框架，以并联在异质任务上共同训练扰动函数。由于Meterperturb是一款对各个层和任务各种分布进行培训的设置功能，因此它可以推广到异构任务和体系结构。我们通过将其应用于在异质目标数据集上针对各种正规化器和微调的培训，验证了对特定源域和体系结构训练的甲局部培训的功效和普遍性。结果表明，在大多数任务和体系结构上，经过甲板训练的网络显着优于基准，而参数大小的增加可以忽略不计，没有超级参数来调整。

Regularization and transfer learning are two popular techniques to enhance generalization on unseen data, which is a fundamental problem of machine learning. Regularization techniques are versatile, as they are task- and architecture-agnostic, but they do not exploit a large amount of data available. Transfer learning methods learn to transfer knowledge from one domain to another, but may not generalize across tasks and architectures, and may introduce new training cost for adapting to the target task. To bridge the gap between the two, we propose a transferable perturbation, MetaPerturb, which is meta-learned to improve generalization performance on unseen data. MetaPerturb is implemented as a set-based lightweight network that is agnostic to the size and the order of the input, which is shared across the layers. Then, we propose a meta-learning framework, to jointly train the perturbation function over heterogeneous tasks in parallel. As MetaPerturb is a set-function trained over diverse distributions across layers and tasks, it can generalize to heterogeneous tasks and architectures. We validate the efficacy and generality of MetaPerturb trained on a specific source domain and architecture, by applying it to the training of diverse neural architectures on heterogeneous target datasets against various regularizers and fine-tuning. The results show that the networks trained with MetaPerturb significantly outperform the baselines on most of the tasks and architectures, with a negligible increase in the parameter size and no hyperparameters to tune.

下载PDF全文

下载文献需遵守相关版权规定

论文标题