任何时间神经网络的正交的SGD和嵌套体系结构

论文标题

任何时间神经网络的正交的SGD和嵌套体系结构

Orthogonalized SGD and Nested Architectures for Anytime Neural Networks

论文作者

Wan, Chengcheng, Hoffmann, Henry, Lu, Shan, Maire, Michael

论文摘要

我们提出了一种用于培训网络体系结构的新型SGD变体，这些变体随时支持任何时间：随着时间的流逝，此类网络会产生一系列日益精确的输出。这些网络的有效建筑设计集中于重新使用内部状态；子网必须产生与立即预测以及通过后续网络阶段进行完善相关的表示形式。我们考虑传统的分支网络以及新的递归嵌套网络。我们的新优化器，正交的SGD，在训练多任务网络时动态重新平衡特定于任务的梯度。在任何时间架构的上下文中，此优化器从较晚的输出将梯度投射到不干扰早期输出的参数子空间上。实验表明，正交SGD的培训显着提高了任何时间网络的概括精度。

We propose a novel variant of SGD customized for training network architectures that support anytime behavior: such networks produce a series of increasingly accurate outputs over time. Efficient architectural designs for these networks focus on re-using internal state; subnetworks must produce representations relevant for both immediate prediction as well as refinement by subsequent network stages. We consider traditional branched networks as well as a new class of recursively nested networks. Our new optimizer, Orthogonalized SGD, dynamically re-balances task-specific gradients when training a multitask network. In the context of anytime architectures, this optimizer projects gradients from later outputs onto a parameter subspace that does not interfere with those from earlier outputs. Experiments demonstrate that training with Orthogonalized SGD significantly improves generalization accuracy of anytime networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题