有效的LSTM培训具有资格轨迹

论文标题

有效的LSTM培训具有资格轨迹

Efficient LSTM Training with Eligibility Traces

论文作者

Hoyer, Michael, Eivazi, Shahram, Otte, Sebastian

论文摘要

训练复发性神经网络主要通过时间（BPTT）的反向传播来实现。但是，从生物学和计算角度来看，该算法并不是最佳解决方案。 BPTT的更有效且在生物学上更合理的替代方案是E-Prop。我们研究了E-Prop对长期短期内存（LSTM）的适用性，包括监督和强化学习（RL）任务。我们表明，通过将其与两个基准上的BPTT进行比较以进行监督学习，E-Prop是LSTMS的合适优化算法。这证明，即使对于数百个时间步长的长序列问题，电子程序也可以实现学习。我们介绍扩展，以改善E-Prop的性能，这些扩展可以部分应用于其他网络体系结构。在这些扩展的帮助下，我们表明，在某些条件下，E-Prop可以胜过BPTT的两个基准之一，用于监督学习。最后，我们提供了将电子螺旋桨与RL集成到深度Q学习领域的概念证明。

Training recurrent neural networks is predominantly achieved via backpropagation through time (BPTT). However, this algorithm is not an optimal solution from both a biological and computational perspective. A more efficient and biologically plausible alternative for BPTT is e-prop. We investigate the applicability of e-prop to long short-term memorys (LSTMs), for both supervised and reinforcement learning (RL) tasks. We show that e-prop is a suitable optimization algorithm for LSTMs by comparing it to BPTT on two benchmarks for supervised learning. This proves that e-prop can achieve learning even for problems with long sequences of several hundred timesteps. We introduce extensions that improve the performance of e-prop, which can partially be applied to other network architectures. With the help of these extensions we show that, under certain conditions, e-prop can outperform BPTT for one of the two benchmarks for supervised learning. Finally, we deliver a proof of concept for the integration of e-prop to RL in the domain of deep recurrent Q-learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题