不当学习的不当控制

论文标题

不当学习的不当控制

Improper Learning for Non-Stochastic Control

论文作者

Simchowitz, Max, Singh, Karan, Hazan, Elad

论文摘要

我们考虑了控制可能未知的线性动力学系统，具有对抗性扰动，对抗选择的凸丢失函数以及部分观察到的状态，称为非策略对照。我们基于被剥离的观测值介绍了控制器参数化，并证明将在线梯度下降应用于此参数化会产生一个新的控制器，该控制器获得了Sublinear Realist vs.与一大批闭环策略。在完全对抗的环境中，我们的控制器达到了$ \ sqrt {t} $的最佳遗憾，当该系统已知时，并且与最小二乘估计的初始阶段相结合时，当系统尚不清楚时，$ t^{2/3} $当时$ t^{2/3} $；两者都为部分观察到的设置产生了第一个sublinear遗憾。我们的界限是与\ emph {ash aLl}稳定线性动力控制器竞争的非策略控制设置中的第一个，而不仅仅是状态反馈。此外，在包含随机组件和对抗组件的半逆转噪声的存在下，我们的控制器达到了$ \ mathrm {poly}（\ log log t）$的最佳后悔界限，而当系统已知时，$ \ sqrt {t} $ ness of Nownewness nrowness nown已知。据我们所知，这为在线线性二次高斯控制器提供了第一个端到端$ \ sqrt {t} $遗憾，并在更一般的环境中使用对抗性损失和半反向噪声。

We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states, known as non-stochastic control. We introduce a controller parametrization based on the denoised observations, and prove that applying online gradient descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies. In the fully-adversarial setting, our controller attains an optimal regret bound of $\sqrt{T}$-when the system is known, and, when combined with an initial stage of least-squares estimation, $T^{2/3}$ when the system is unknown; both yield the first sublinear regret for the partially observed setting. Our bounds are the first in the non-stochastic control setting that compete with \emph{all} stabilizing linear dynamical controllers, not just state feedback. Moreover, in the presence of semi-adversarial noise containing both stochastic and adversarial components, our controller attains the optimal regret bounds of $\mathrm{poly}(\log T)$ when the system is known, and $\sqrt{T}$ when unknown. To our knowledge, this gives the first end-to-end $\sqrt{T}$ regret for online Linear Quadratic Gaussian controller, and applies in a more general setting with adversarial losses and semi-adversarial noise.

下载PDF全文

下载文献需遵守相关版权规定

论文标题