推断算法学习到任意序列长度的进展

论文标题

推断算法学习到任意序列长度的进展

Progress Extrapolating Algorithmic Learning to Arbitrary Sequence Lengths

论文作者

Robinson, Andreas

论文摘要

最近用于算法任务的神经网络模型导致序列的外推的显着改善要比训练更长，但是这仍然是一个重大的问题，即绩效仍然很长或很长或对抗性序列。我们提出了解决这些问题的替代体系结构和损失处理，我们对这些方法的测试尚未在内存约束中检测到任何剩余的外推错误。我们专注于线性时间算法任务，包括复制，括号解析和二进制添加。首先，使用激活箱来离散训练的网络，以避免从连续操作中进行计算漂移，并添加了基于binning的数字损耗项以鼓励可离散的表示。此外，与分布式内存访问相比，局部可区分的内存（LDM）体系结构解决了剩余的外推误差，并避免了内部计算状态的无界增长。先前的工作发现，算法外推问题也可以通过依靠程序痕迹的方法来缓解，但是当前的努力不依赖此类痕迹。

Recent neural network models for algorithmic tasks have led to significant improvements in extrapolation to sequences much longer than training, but it remains an outstanding problem that the performance still degrades for very long or adversarial sequences. We present alternative architectures and loss-terms to address these issues, and our testing of these approaches has not detected any remaining extrapolation errors within memory constraints. We focus on linear time algorithmic tasks including copy, parentheses parsing, and binary addition. First, activation binning was used to discretize the trained network in order to avoid computational drift from continuous operations, and a binning-based digital loss term was added to encourage discretizable representations. In addition, a localized differentiable memory (LDM) architecture, in contrast to distributed memory access, addressed remaining extrapolation errors and avoided unbounded growth of internal computational states. Previous work has found that algorithmic extrapolation issues can also be alleviated with approaches relying on program traces, but the current effort does not rely on such traces.

下载PDF全文

下载文献需遵守相关版权规定

论文标题