论文标题
学习过渡模型与时间延迟的因果关系
Learning Transition Models with Time-delayed Causal Relations
论文作者
论文摘要
本文介绍了一种算法,用于发现机器人在任意时间观察到的事件之间的隐式和延迟因果关系,目的是提高数据效率和基于模型的增强学习(RL)技术的可解释性。所提出的算法最初通过马尔可夫假设预测观察结果,并逐步引入了新的隐藏变量,以解释和降低观测值的随机性。隐藏的变量是记忆单元,可跟踪过去事件相关的事件。这些事件是通过其信息收益系统识别的。然后将学习的过渡和奖励模型用于计划。对模拟和实际机器人任务的实验表明,此方法比当前的RL技术显着改善。
This paper introduces an algorithm for discovering implicit and delayed causal relations between events observed by a robot at arbitrary times, with the objective of improving data-efficiency and interpretability of model-based reinforcement learning (RL) techniques. The proposed algorithm initially predicts observations with the Markov assumption, and incrementally introduces new hidden variables to explain and reduce the stochasticity of the observations. The hidden variables are memory units that keep track of pertinent past events. Such events are systematically identified by their information gains. The learned transition and reward models are then used for planning. Experiments on simulated and real robotic tasks show that this method significantly improves over current RL techniques.