侧面：视频模型的深度并联培训

论文标题

侧面：视频模型的深度并联培训

Sideways: Depth-Parallel Training of Video Models

论文作者

Malinowski, Mateusz, Swirszcz, Grzegorz, Carreira, Joao, Patraucean, Viorica

论文摘要

我们侧身建议，这是一种用于培训视频模型的近似反向传播方案。在标准反向传播中，通过模型的每个计算步骤的梯度和激活都会在时间上同步。需要存储正向激活，直到执行向后通行，以防止层间（深度）并行化。但是，我们可以利用视频等光滑，冗余的输入流来制定更高效的培训计划吗？在这里，我们探索了反向传播的替代方案。每当新框架中，我们都可以覆盖网络激活。从两次通过的信息中，这种信息的逐渐逐步积累打破了梯度和激活之间的确切对应关系，从而导致理论上更嘈杂的权重更新。违反直觉，我们表明对深卷积视频网络的侧向培训不仅仍然融合，而且与标准同步反向传播相比，还可以表现出更好的概括。

We propose Sideways, an approximate backpropagation scheme for training video models. In standard backpropagation, the gradients and activations at every computation step through the model are temporally synchronized. The forward activations need to be stored until the backward pass is executed, preventing inter-layer (depth) parallelization. However, can we leverage smooth, redundant input streams such as videos to develop a more efficient training scheme? Here, we explore an alternative to backpropagation; we overwrite network activations whenever new ones, i.e., from new frames, become available. Such a more gradual accumulation of information from both passes breaks the precise correspondence between gradients and activations, leading to theoretically more noisy weight updates. Counter-intuitively, we show that Sideways training of deep convolutional video networks not only still converges, but can also potentially exhibit better generalization compared to standard synchronized backpropagation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题