部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Symbolic Visual Reinforcement Learning: A Scalable Framework with Object-Level Abstraction and Differentiable Expression Search

论文作者

Zheng, Wenqing, Sharan, S P, Fan, Zhiwen, Wang, Kevin, Xi, Yihan, Wang, Zhangyang

论文摘要

在增强学习（RL）方面，有效且可解释的政策是一项艰巨的任务，尤其是在带有复杂场景的视觉RL设置中。尽管神经网络已经达到了竞争性能，但最终的政策通常是过度参数化的黑匣子，这些黑匣子难以有效地解释和部署。最新的符号RL框架表明，高级特定领域的编程逻辑可以设计用于处理策略学习和符号计划。但是，这些方法依赖于几乎没有特征学习的编码原始图，并且在应用于高维视觉场景时，它们可能会遭受可伸缩性问题的困扰，并且在图像具有复杂的对象相互作用时表现不佳。为了应对这些挑战，我们建议\ textIt {可区分的符号表达式搜索}（diffses），这是一种新型的符号学习方法，使用部分可区分的优化发现离散的符号策略。通过使用对象级抽象而不是原始像素级输入，Diffses能够利用符号表达式的简单性和可扩展性优势，同时还将神经网络的优势纳入特征学习和优化。我们的实验表明，与最先进的符号RL方法相比，DIFFSE能够生成更简单，更可扩展的符号策略，并且具有减少的符号先验知识。

Learning efficient and interpretable policies has been a challenging task in reinforcement learning (RL), particularly in the visual RL setting with complex scenes. While neural networks have achieved competitive performance, the resulting policies are often over-parameterized black boxes that are difficult to interpret and deploy efficiently. More recent symbolic RL frameworks have shown that high-level domain-specific programming logic can be designed to handle both policy learning and symbolic planning. However, these approaches rely on coded primitives with little feature learning, and when applied to high-dimensional visual scenes, they can suffer from scalability issues and perform poorly when images have complex object interactions. To address these challenges, we propose \textit{Differentiable Symbolic Expression Search} (DiffSES), a novel symbolic learning approach that discovers discrete symbolic policies using partially differentiable optimization. By using object-level abstractions instead of raw pixel-level inputs, DiffSES is able to leverage the simplicity and scalability advantages of symbolic expressions, while also incorporating the strengths of neural networks for feature learning and optimization. Our experiments demonstrate that DiffSES is able to generate symbolic policies that are simpler and more and scalable than state-of-the-art symbolic RL methods, with a reduced amount of symbolic prior knowledge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题