论文标题
临时潜在瓶颈:序列学习中快速和缓慢处理机制的合成
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
论文作者
论文摘要
复发性神经网络对学习时间压缩表示具有很大的感应偏见,因为序列的整个历史记录由单个向量表示。相比之下,变形金刚几乎没有诱导性偏见,因为它们允许在序列中对所有先前计算的元素进行关注。具有更加压缩的序列表示可能有益于概括,因为高级表示可以更容易地重复使用和重新使用,并且将包含更少的无关细节。同时,表现形式过多的压缩是以表达性为代价的。我们提出了一种将计算分为两个流的解决方案。自然界中复发的慢速流旨在通过强迫$ k $时间步长的块进入单个表示形式,以学习专业和压缩的表示形式,该表示被分为多个向量。同时,快速流被参数为变压器,以处理由$ k $ time steps组成的块,该块以慢速流中的信息为条件。在拟议的方法中,我们希望获得变压器的表现力,同时鼓励慢速流中表现形式的更好的压缩和结构。与各种竞争性基线相比,我们在提高样本效率和概括性能方面,显示了所提出方法的好处,以视觉感知和顺序决策任务。
Recurrent neural networks have a strong inductive bias towards learning temporally compressed representations, as the entire history of a sequence is represented by a single vector. By contrast, Transformers have little inductive bias towards learning temporally compressed representations, as they allow for attention over all previously computed elements in a sequence. Having a more compressed representation of a sequence may be beneficial for generalization, as a high-level representation may be more easily re-used and re-purposed and will contain fewer irrelevant details. At the same time, excessive compression of representations comes at the cost of expressiveness. We propose a solution which divides computation into two streams. A slow stream that is recurrent in nature aims to learn a specialized and compressed representation, by forcing chunks of $K$ time steps into a single representation which is divided into multiple vectors. At the same time, a fast stream is parameterized as a Transformer to process chunks consisting of $K$ time-steps conditioned on the information in the slow-stream. In the proposed approach we hope to gain the expressiveness of the Transformer, while encouraging better compression and structuring of representations in the slow stream. We show the benefits of the proposed method in terms of improved sample efficiency and generalization performance as compared to various competitive baselines for visual perception and sequential decision making tasks.