论文标题
联合嵌入预测架构集中在慢速功能上
Joint Embedding Predictive Architectures Focus on Slow Features
论文作者
论文摘要
学习基于像素的环境的世界模型的许多常见方法都使用经过像素级重建目标训练的生成体系结构。最近提出的联合嵌入预测架构(JEPA)提供了无重建的替代方案。在这项工作中,我们分析了在完全离线设置中接受VICREG和SIMCLR目标训练的JEPA的性能,而无需访问奖励,并将结果与生成体系结构的性能进行比较。我们在简单环境中使用带有各种背景干扰器的移动点测试方法,并探测了该点位置的表示形式。我们发现,当干扰器噪声在每一个时间步骤变化时,JEPA方法在PAR或重建方面的性能都比重建更好,但是当固定噪声时失败。此外,我们为具有固定噪声的基于JEPA的方法的性能不佳提供了理论上的解释,从而突出了一个重要的限制。
Many common methods for learning a world model for pixel-based environments use generative architectures trained with pixel-level reconstruction objectives. Recently proposed Joint Embedding Predictive Architectures (JEPA) offer a reconstruction-free alternative. In this work, we analyze performance of JEPA trained with VICReg and SimCLR objectives in the fully offline setting without access to rewards, and compare the results to the performance of the generative architecture. We test the methods in a simple environment with a moving dot with various background distractors, and probe learned representations for the dot's location. We find that JEPA methods perform on par or better than reconstruction when distractor noise changes every time step, but fail when the noise is fixed. Furthermore, we provide a theoretical explanation for the poor performance of JEPA-based methods with fixed noise, highlighting an important limitation.