论文标题
有效的LSTM培训具有资格轨迹
Efficient LSTM Training with Eligibility Traces
论文作者
论文摘要
训练复发性神经网络主要通过时间(BPTT)的反向传播来实现。但是,从生物学和计算角度来看,该算法并不是最佳解决方案。 BPTT的更有效且在生物学上更合理的替代方案是E-Prop。我们研究了E-Prop对长期短期内存(LSTM)的适用性,包括监督和强化学习(RL)任务。我们表明,通过将其与两个基准上的BPTT进行比较以进行监督学习,E-Prop是LSTMS的合适优化算法。这证明,即使对于数百个时间步长的长序列问题,电子程序也可以实现学习。我们介绍扩展,以改善E-Prop的性能,这些扩展可以部分应用于其他网络体系结构。在这些扩展的帮助下,我们表明,在某些条件下,E-Prop可以胜过BPTT的两个基准之一,用于监督学习。最后,我们提供了将电子螺旋桨与RL集成到深度Q学习领域的概念证明。
Training recurrent neural networks is predominantly achieved via backpropagation through time (BPTT). However, this algorithm is not an optimal solution from both a biological and computational perspective. A more efficient and biologically plausible alternative for BPTT is e-prop. We investigate the applicability of e-prop to long short-term memorys (LSTMs), for both supervised and reinforcement learning (RL) tasks. We show that e-prop is a suitable optimization algorithm for LSTMs by comparing it to BPTT on two benchmarks for supervised learning. This proves that e-prop can achieve learning even for problems with long sequences of several hundred timesteps. We introduce extensions that improve the performance of e-prop, which can partially be applied to other network architectures. With the help of these extensions we show that, under certain conditions, e-prop can outperform BPTT for one of the two benchmarks for supervised learning. Finally, we deliver a proof of concept for the integration of e-prop to RL in the domain of deep recurrent Q-learning.