论文标题
在远征科学的时空环境中不确定性下的机器人计划
Robotic Planning under Uncertainty in Spatiotemporal Environments in Expeditionary Science
论文作者
论文摘要
在远征科学中,空时变化的环境 - 水热羽,藻花,熔岩流或动物迁移 - 无处不在。移动机器人非常适合研究这些动态的中尺度自然环境。我们将探险科学形式化为一个顺序决策问题,以部分观察到的马尔可夫决策过程(POMDP)的语言进行建模。在现实世界的约束下解决探险科学POMDP需要在复杂动力学和观察模型的问题中有效的概率建模和决策。使用数据驱动的模型和基于信息的奖励,在静态环境中,在静态环境中,以前的路径计划,自适应采样和实验设计的先前工作已经显示出了令人信服的结果。但是,这些方法并没有在时空环境中毫不费力地扩展到探险科学:它们通常不利用科学知识,例如国家动态方程,而是专注于信息收集而不是科学任务执行,并且使用决策方法,这些方法可以通过较大的,连续的,与长期计划水平和实时的实时约束相比,将其扩展到较大的,连续的问题。在这项工作中,我们讨论了与探险科学中概率建模和决策有关的这些挑战,并介绍了解决这些差距的一些初步工作。我们将结果扎根于深海中自动水下车辆(AUV)的真正探险科学部署,以发现水热通风孔发现和表征。我们的结论性思想突出了剩下的工作,以及强化学习和决策社区的值得考虑的挑战。
In the expeditionary sciences, spatiotemporally varying environments -- hydrothermal plumes, algal blooms, lava flows, or animal migrations -- are ubiquitous. Mobile robots are uniquely well-suited to study these dynamic, mesoscale natural environments. We formalize expeditionary science as a sequential decision-making problem, modeled using the language of partially-observable Markov decision processes (POMDPs). Solving the expeditionary science POMDP under real-world constraints requires efficient probabilistic modeling and decision-making in problems with complex dynamics and observational models. Previous work in informative path planning, adaptive sampling, and experimental design have shown compelling results, largely in static environments, using data-driven models and information-based rewards. However, these methodologies do not trivially extend to expeditionary science in spatiotemporal environments: they generally do not make use of scientific knowledge such as equations of state dynamics, they focus on information gathering as opposed to scientific task execution, and they make use of decision-making approaches that scale poorly to large, continuous problems with long planning horizons and real-time operational constraints. In this work, we discuss these and other challenges related to probabilistic modeling and decision-making in expeditionary science, and present some of our preliminary work that addresses these gaps. We ground our results in a real expeditionary science deployment of an autonomous underwater vehicle (AUV) in the deep ocean for hydrothermal vent discovery and characterization. Our concluding thoughts highlight remaining work to be done, and the challenges that merit consideration by the reinforcement learning and decision-making community.