数据驱动的外环控制使用深度加固学习进行轨迹跟踪

论文标题

数据驱动的外环控制使用深度加固学习进行轨迹跟踪

Data-driven Outer-Loop Control Using Deep Reinforcement Learning for Trajectory Tracking

论文作者

Arroyo, Maria Angelica, Giraldo, Luis Felipe

论文摘要

参考跟踪系统涉及由当地反馈控制器稳定的植物和指示工厂应遵循的参考设定点的指挥中心。通常，这些系统受到限制，例如干扰，系统延迟，限制，不确定性，表现不佳的控制器以及未允许它们实现所需性能的未建模参数。在不可能重新设计内环系统的情况下，通常会合并一个外环控件，该控件指示系统遵循修改的参考路径，以使所得的路径接近理想的路径。通常，设计外环控制的策略需要了解系统的模型，这可能是不可行的任务。在本文中，我们提出了一个基于深入强化学习的框架，该框架可以学习政策，以生成修改的参考，以非侵入性和无模型的方式改善系统的性能。为了说明我们的方法的有效性，我们在工程学中提出了两个具有挑战性的案例：一个带有试点模型的飞行控制，其中包括人类反应延迟，以及用于大量空间加热设备的平均场控制问题。提出的策略成功设计了一个参考信号，即使在学习过程中看不见的情况下也有效。

Reference tracking systems involve a plant that is stabilized by a local feedback controller and a command center that indicates the reference set-point the plant should follow. Typically, these systems are subject to limitations such as disturbances, systems delays, constraints, uncertainties, underperforming controllers, and unmodeled parameters that do not allow them to achieve the desired performance. In situations where it is not possible to redesign the inner-loop system, it is usual to incorporate an outer-loop control that instructs the system to follow a modified reference path such that the resultant path is close to the ideal one. Typically, strategies to design the outer-loop control need to know a model of the system, which can be an unfeasible task. In this paper, we propose a framework based on deep reinforcement learning that can learn a policy to generate a modified reference that improves the system's performance in a non-invasive and model-free fashion. To illustrate the effectiveness of our approach, we present two challenging cases in engineering: a flight control with a pilot model that includes human reaction delays, and a mean-field control problem for a massive number of space-heating devices. The proposed strategy successfully designs a reference signal that works even in situations that were not seen during the learning process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题