论文标题
连续深度神经网络
Continuous-in-Depth Neural Networks
论文作者
论文摘要
最近的工作试图将残留网络(RESNET)解释为普通微分方程的前向Euler离散化的一步,主要集中在两个系统之间的句法代数相似性上。但是,连续动力系统的离散动力集成器具有更丰富的结构。我们首先表明,从这个更丰富的意义上讲,重新结构并未成为有意义的动态集成器。然后,我们证明了神经网络模型可以通过将它们嵌入更高阶段的数值集成方案(例如Runge Kutta方案)中,从而用这种更丰富的结构和属性来代表连续的动态系统。基于这些见解,我们将连续网络作为重新网络体系结构的连续深度概括。连续网络对特定的计算图表现表现出不变性。也就是说,可以使用不同的离散时间步长评估连续的深度模型,从而改变层的数量和不同的数值集成方案,从而改变图形连接。我们表明,这可以用于开发一个深入的训练方案,以提高模型质量,同时大大减少训练时间。我们还表明,一旦受过训练,计算图中的单元数量甚至可以减少,以便更快地推断,几乎没有准确的降低。
Recent work has attempted to interpret residual networks (ResNets) as one step of a forward Euler discretization of an ordinary differential equation, focusing mainly on syntactic algebraic similarities between the two systems. Discrete dynamical integrators of continuous dynamical systems, however, have a much richer structure. We first show that ResNets fail to be meaningful dynamical integrators in this richer sense. We then demonstrate that neural network models can learn to represent continuous dynamical systems, with this richer structure and properties, by embedding them into higher-order numerical integration schemes, such as the Runge Kutta schemes. Based on these insights, we introduce ContinuousNet as a continuous-in-depth generalization of ResNet architectures. ContinuousNets exhibit an invariance to the particular computational graph manifestation. That is, the continuous-in-depth model can be evaluated with different discrete time step sizes, which changes the number of layers, and different numerical integration schemes, which changes the graph connectivity. We show that this can be used to develop an incremental-in-depth training scheme that improves model quality, while significantly decreasing training time. We also show that, once trained, the number of units in the computational graph can even be decreased, for faster inference with little-to-no accuracy drop.