基于梯度的批次尺寸适应的超级学习

论文标题

基于梯度的批次尺寸适应的超级学习

Hyper-Learning for Gradient-Based Batch Size Adaptation

论文作者

MacLellan, Calum Robert, Dong, Feng

论文摘要

在训练深层神经网络时，安排批量大小以增加梯度噪声是一种有效的策略。当前的方法实施了计划启发式方法，以忽略优化程序中的结构，从而将其灵活性限制在训练动力和能力上，以辨别其适应对概括的影响。我们将仲裁器作为一种新的高参数优化算法介绍，以使用元视力函数中的梯度进行批处理适应量，以进行可学习的调度启发式，该梯度通过执行一个称为“超级练习的新型学习过程”来克服以前的启发式约束。通过超级学习，仲裁者通过在t内下降步骤上观察伴随的响应来学习自适应启发式，从而为内部深网络制定了神经网络代理，以生成最佳的批次大小样本。仲裁者避免了展开的优化，并且不需要超网络来促进梯度，从而使其便宜，易于实施，并且可以使用不同的任务。我们在几个说明性实验中证明了仲裁员的有效性：充当独立的批量调度程序；以更大的灵活性补充固定的批次尺寸时间表；并促进学习率随机元优化期间的差异。

Scheduling the batch size to increase is an effective strategy to control gradient noise when training deep neural networks. Current approaches implement scheduling heuristics that neglect structure within the optimization procedure, limiting their flexibility to the training dynamics and capacity to discern the impact of their adaptations on generalization. We introduce Arbiter as a new hyperparameter optimization algorithm to perform batch size adaptations for learnable scheduling heuristics using gradients from a meta-objective function, which overcomes previous heuristic constraints by enforcing a novel learning process called hyper-learning. With hyper-learning, Arbiter formulates a neural network agent to generate optimal batch size samples for an inner deep network by learning an adaptive heuristic through observing concomitant responses over T inner descent steps. Arbiter avoids unrolled optimization, and does not require hypernetworks to facilitate gradients, making it reasonably cheap, simple to implement, and versatile to different tasks. We demonstrate Arbiter's effectiveness in several illustrative experiments: to act as a stand-alone batch size scheduler; to complement fixed batch size schedules with greater flexibility; and to promote variance reduction during stochastic meta-optimization of the learning rate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题