论文标题
利用随机平滑以最佳控制非平滑动力学系统
Leveraging Randomized Smoothing for Optimal Control of Nonsmooth Dynamical Systems
论文作者
论文摘要
最佳控制(OC)算法(例如差异动态编程(DDP))利用动力学的衍生物有效地控制物理系统。然而,在存在非平滑动力学系统的情况下,这种类别的算法可能失败,例如,由于动力学衍生物中存在不连续性或由于非信息性梯度的存在。相反,增强学习(RL)算法在表现出非平滑效果(接触,摩擦等)的情况下显示出更好的经验结果。我们的方法利用了关于随机平滑(RS)的最新作品来解决最佳控制中通常遇到的非平滑度问题,并通过RS方法的棱镜通过RS和OC之间的相互作用提供了关键的见解。这自然会导致我们以非常有效的方式引入随机差分动态编程(R-DDP)算法,以确定性但非平滑动力学。该实验表明,我们的方法能够解决干摩擦和摩擦接触的经典机器人问题,在这种情况下,经典的OC算法可能会失败,并且RL算法在实践中需要过多数量的样品来找到最佳解决方案。
Optimal control (OC) algorithms such as Differential Dynamic Programming (DDP) take advantage of the derivatives of the dynamics to efficiently control physical systems. Yet, in the presence of nonsmooth dynamical systems, such class of algorithms are likely to fail due, for instance, to the presence of discontinuities in the dynamics derivatives or because of non-informative gradient. On the contrary, reinforcement learning (RL) algorithms have shown better empirical results in scenarios exhibiting non-smooth effects (contacts, frictions, etc). Our approach leverages recent works on randomized smoothing (RS) to tackle non-smoothness issues commonly encountered in optimal control, and provides key insights on the interplay between RL and OC through the prism of RS methods. This naturally leads us to introduce the randomized Differential Dynamic Programming (R-DDP) algorithm accounting for deterministic but non-smooth dynamics in a very sample-efficient way. The experiments demonstrate that our method is able to solve classic robotic problems with dry friction and frictional contacts, where classical OC algorithms are likely to fail and RL algorithms require in practice a prohibitive number of samples to find an optimal solution.