论文标题
评论家顺序蒙特卡洛
Critic Sequential Monte Carlo
论文作者
论文摘要
我们引入了评论家,这是一种新算法,以计划推理,该推论是根据具有学习软Q功能启发式因素的顺序蒙特卡洛组成而构建的。这些启发式因素是从面前边缘可能性的参数近似获得的,更有效地将SMC引导到所需的目标分布,这对于在及时占据严格约束的环境中特别有用。与以前的工作相比,我们修改了这种启发式因素的放置,这使我们能够便宜地提出和评估大量推定的动作颗粒,从而大大提高了推理和计划效率。批评与信息的先验兼容,其密度函数不需要知道,并且可以用作无模型的控制算法。我们在高维模拟驾驶任务中进行避免碰撞的实验表明,批评会大大降低碰撞率,以低计算成本,同时保持现实主义和跨车辆和环境场景的驾驶行为的多样性。
We introduce CriticSMC, a new algorithm for planning as inference built from a composition of sequential Monte Carlo with learned Soft-Q function heuristic factors. These heuristic factors, obtained from parametric approximations of the marginal likelihood ahead, more effectively guide SMC towards the desired target distribution, which is particularly helpful for planning in environments with hard constraints placed sparsely in time. Compared with previous work, we modify the placement of such heuristic factors, which allows us to cheaply propose and evaluate large numbers of putative action particles, greatly increasing inference and planning efficiency. CriticSMC is compatible with informative priors, whose density function need not be known, and can be used as a model-free control algorithm. Our experiments on collision avoidance in a high-dimensional simulated driving task show that CriticSMC significantly reduces collision rates at a low computational cost while maintaining realism and diversity of driving behaviors across vehicles and environment scenarios.