mtadam：自动平衡多个培训损失条款

论文标题

mtadam：自动平衡多个培训损失条款

MTAdam: Automatic Balancing of Multiple Training Loss Terms

论文作者

Malkiel, Itzik, Wolf, Lior

论文摘要

训练神经模型时，通常将多个损失项结合在一起。这些术语的平衡需要相当大的人为努力，并且在计算上要求。此外，随着培训的进行，损失期限之间的最佳权衡可能会改变，尤其是对于对抗性条款。在这项工作中，我们将ADAM优化算法推广到处理多个损失项。指导原则是，对于每一层，术语的梯度大小都应平衡。为此，多学期亚当（MTADAM）分别计算每个损失项的衍生物，渗透每个参数和损失项的第一和第二矩，并计算每个损失产生的梯度幅度的第一个矩。这种大小用于连续平衡所有层的梯度，以从一层到下一层变化并随时间变化而变化。我们的结果表明，使用新方法的培训可从次优的初始减肥体重，以及训练结果与常规训练与每种方法的规定超参数相匹配的训练结果。

When training neural models, it is common to combine multiple loss terms. The balancing of these terms requires considerable human effort and is computationally demanding. Moreover, the optimal trade-off between the loss term can change as training progresses, especially for adversarial terms. In this work, we generalize the Adam optimization algorithm to handle multiple loss terms. The guiding principle is that for every layer, the gradient magnitude of the terms should be balanced. To this end, the Multi-Term Adam (MTAdam) computes the derivative of each loss term separately, infers the first and second moments per parameter and loss term, and calculates a first moment for the magnitude per layer of the gradients arising from each loss. This magnitude is used to continuously balance the gradients across all layers, in a manner that both varies from one layer to the next and dynamically changes over time. Our results show that training with the new method leads to fast recovery from suboptimal initial loss weighting and to training outcomes that match conventional training with the prescribed hyperparameters of each method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题