论文标题
用于学习迭代算法的递归复发神经网络(R2N2)体系结构
A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms
论文作者
论文摘要
给定任务的数值算法的元学习包括数据驱动的识别和适应算法结构和相关的超参数。为了限制元学习问题的复杂性,可以并且应该使用具有一定诱导性偏置的神经体系结构对有利的算法结构。我们将先前引入的Runge-Kutta神经网络概括为递归复发的神经网络(R2N2)上层建筑,以设计自定义的迭代算法。与现成的深度学习方法相反,它具有独特的分裂,分为模块,以生成信息以及随后将此信息集成到解决方案方面。以子空间形式的局部信息是通过从当前的外迭代开始的下属,内部,内部,复发函数评估的迭代生成的。对下一个外部迭代的更新计算为这些评估的线性组合,减少了该空间中的残差,并构成了网络的输出。我们证明,在各种计算问题类别的输入/输出数据上,定期训练的重量参数会产生类似于线性方程系统的Krylov solvers,非线性方程系统的Newton-Krylov求解器,以及用于普通微分方程的newton-Krylov solvers和runge-kutta集成符。由于其模块化,可以通过基于Taylor系列扩展的传统上代表传统上代表更通用的迭代算法类别所需的功能来扩展上层建筑。
Meta-learning of numerical algorithms for a given task consists of the data-driven identification and adaptation of an algorithmic structure and the associated hyperparameters. To limit the complexity of the meta-learning problem, neural architectures with a certain inductive bias towards favorable algorithmic structures can, and should, be used. We generalize our previously introduced Runge-Kutta neural network to a recursively recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms. In contrast to off-the-shelf deep learning approaches, it features a distinct division into modules for generation of information and for the subsequent assembly of this information towards a solution. Local information in the form of a subspace is generated by subordinate, inner, iterations of recurrent function evaluations starting at the current outer iterate. The update to the next outer iterate is computed as a linear combination of these evaluations, reducing the residual in this space, and constitutes the output of the network. We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields iterations similar to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta integrators for ordinary differential equations. Due to its modularity, the superstructure can be readily extended with functionalities needed to represent more general classes of iterative algorithms traditionally based on Taylor series expansions.