在复发性神经网络中的持续学习

论文标题

在复发性神经网络中的持续学习

Continual Learning in Recurrent Neural Networks

论文作者

Ehret, Benjamin, Henning, Christian, Cervera, Maria R., Meulemans, Alexander, von Oswald, Johannes, Grewe, Benjamin F.

论文摘要

尽管已经提出了多种持续学习方法（CL）方法来防止灾难性遗忘，但缺乏对它们使用复发性神经网络（RNN）处理顺序数据的有效性的彻底研究。在这里，我们对各种顺序数据基准的已建立CL方法进行了首次全面评估。具体而言，我们阐明了将重量效率方法（例如弹性重量巩固）应用于RNN时出现的特殊性。与FeedForward网络相反，RNN迭代重复使用一组共享的权重，并需要工作内存来处理输入样本。我们表明，权重方法的性能不受处理后序列的长度的直接影响，而是受高工作记忆需求的影响，这导致对稳定的需求增加，以减少可塑性来学习后续任务。我们还提供了通过研究线性RNN来支持这种解释的理论论点。我们的研究表明，已建立的CL方法可以成功地移植到复发案例中，并且基于超网络的最新正则化方法优于重量表现方法，从而成为RNN中CL的有希望的候选者。总体而言，我们提供有关前馈网络和RNN中CL之间差异的见解，同时指导有效的解决方案，以解决顺序数据。

While a diverse collection of continual learning (CL) methods has been proposed to prevent catastrophic forgetting, a thorough investigation of their effectiveness for processing sequential data with recurrent neural networks (RNNs) is lacking. Here, we provide the first comprehensive evaluation of established CL methods on a variety of sequential data benchmarks. Specifically, we shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs. In contrast to feedforward networks, RNNs iteratively reuse a shared set of weights and require working memory to process input samples. We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements, which lead to an increased need for stability at the cost of decreased plasticity for learning subsequent tasks. We additionally provide theoretical arguments supporting this interpretation by studying linear RNNs. Our study shows that established CL methods can be successfully ported to the recurrent case, and that a recent regularization approach based on hypernetworks outperforms weight-importance methods, thus emerging as a promising candidate for CL in RNNs. Overall, we provide insights on the differences between CL in feedforward networks and RNNs, while guiding towards effective solutions to tackle CL on sequential data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题