通过增强学习明确鼓励低分数尺寸轨迹

论文标题

通过增强学习明确鼓励低分数尺寸轨迹

Explicitly Encouraging Low Fractional Dimensional Trajectories Via Reinforcement Learning

论文作者

Gillen, Sean, Byl, Katie

论文摘要

在制定反馈控制策略中使用各种现代机器学习方法的关键局限性是缺乏适当的方法来分析其长期动态，即（甚至统计上）对鲁棒性做出任何保证。造成这种情况的主要原因在很大程度上是由于所谓的维度诅咒，以及由此产生的控制策略本身的黑盒性质。本文旨在首先提出这些问题。尽管系统的完整状态空间在维度上可能很大，但它是大多数基于模型的控制方法的共同特征，结果闭环系统表现出主导的动力学，这些动力学迅速驱动到内部一些较低维度的子空间。在这项工作中，我们认为该子空间的维度是由分形几何形状的工具捕获的，即分数维度的各种概念。然后，我们表明，由模型的无加固学习剂引起的轨迹的维度可能会受到影响，从而在代理奖励信号中添加后处理功能。我们验证降低降低对噪声添加到系统中是可靠的，并表明修改后的代理实际上对我们检查的系统更为强大，并将其推动干扰。

A key limitation in using various modern methods of machine learning in developing feedback control policies is the lack of appropriate methodologies to analyze their long-term dynamics, in terms of making any sort of guarantees (even statistically) about robustness. The central reasons for this are largely due to the so-called curse of dimensionality, combined with the black-box nature of the resulting control policies themselves. This paper aims at the first of these issues. Although the full state space of a system may be quite large in dimensionality, it is a common feature of most model-based control methods that the resulting closed-loop systems demonstrate dominant dynamics that are rapidly driven to some lower-dimensional sub-space within. In this work we argue that the dimensionality of this subspace is captured by tools from fractal geometry, namely various notions of a fractional dimension. We then show that the dimensionality of trajectories induced by model free reinforcement learning agents can be influenced adding a post processing function to the agents reward signal. We verify that the dimensionality reduction is robust to noise being added to the system and show that that the modified agents are more actually more robust to noise and push disturbances in general for the systems we examined.

下载PDF全文

下载文献需遵守相关版权规定

论文标题