线性二次高斯（LQG）设置中的自适应控制和遗憾最小化

论文标题

线性二次高斯（LQG）设置中的自适应控制和遗憾最小化

Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting

论文作者

Lale, Sahin, Azizzadenesheli, Kamyar, Hassibi, Babak, Anandkumar, Anima

论文摘要

我们研究了部分可观察到的线性二次高斯控制系统中自适应控制的问题，其中模型动力学尚不清楚。我们提出了LQGOPT，这是一种基于不确定性的乐观原则的新型增强学习算法，以有效地最大程度地减少整体控制成本。我们采用了系统动力学的预测变态状态演变表示形式，并部署了最近提出的闭环系统识别方法，估计和置信度结构的结构。 LQGOPT有效地探索了系统动力学，将模型参数估算到其置信区间，并部署最乐观的模型的控制器，以进一步探索和开发。我们为LQGOPT提供稳定性保证，并证明了$ \ tilde {\ Mathcal {o}}}（\ sqrt {t}）$的遗憾上限，用于自适应控制线性四边形高斯（LQG）系统，其中$ t $是问题的时间范围。

We study the problem of adaptive control in partially observable linear quadratic Gaussian control systems, where the model dynamics are unknown a priori. We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty, to effectively minimize the overall control cost. We employ the predictor state evolution representation of the system dynamics and deploy a recently proposed closed-loop system identification method, estimation, and confidence bound construction. LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model for further exploration and exploitation. We provide stability guarantees for LqgOpt and prove the regret upper bound of $\tilde{\mathcal{O}}(\sqrt{T})$ for adaptive control of linear quadratic Gaussian (LQG) systems, where $T$ is the time horizon of the problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题