论文标题
数据有效的增强学习和自适应网络流量动态的最佳最佳外围控制
Data efficient reinforcement learning and adaptive optimal perimeter control of network traffic dynamics
论文作者
论文摘要
现有的数据驱动和反馈流量控制策略不考虑实时数据测量的异质性。此外,对于缺乏数据效率,传统的加固学习方法(RL)方法通常会缓慢收敛。此外,常规的最佳外围控制方案需要对系统动力学的精确了解,因此对内源性不确定性会很脆弱。为了应对这些挑战,这项工作提出了一种基于不可或缺的增强学习(IRL)的方法,用于学习宏观交通动态,以进行自适应最佳周边控制。这项工作为运输文献做出了以下主要贡献:(a)开发了连续的时间控制,并具有离散增益更新以适应离散时间传感器数据。 (b)为了降低采样复杂性并更有效地使用可用数据,将体验重播(ER)技术引入IRL算法。 (c)所提出的方法以“无模型”方式放松模型校准的要求,该方式可以稳健地通过数据驱动的RL算法来建模不确定性并增强实时性能。 (d)通过Lyapunov理论证明了基于IRL的算法和受控流量动力学的稳定性的融合。最佳控制定律被参数化,然后通过神经网络(NN)近似,从而缓解计算复杂性。在不需要模型线性化的同时,考虑了状态和输入约束。提出了数值示例和仿真实验,以验证所提出方法的有效性和效率。
Existing data-driven and feedback traffic control strategies do not consider the heterogeneity of real-time data measurements. Besides, traditional reinforcement learning (RL) methods for traffic control usually converge slowly for lacking data efficiency. Moreover, conventional optimal perimeter control schemes require exact knowledge of the system dynamics and thus would be fragile to endogenous uncertainties. To handle these challenges, this work proposes an integral reinforcement learning (IRL) based approach to learning the macroscopic traffic dynamics for adaptive optimal perimeter control. This work makes the following primary contributions to the transportation literature: (a) A continuous-time control is developed with discrete gain updates to adapt to the discrete-time sensor data. (b) To reduce the sampling complexity and use the available data more efficiently, the experience replay (ER) technique is introduced to the IRL algorithm. (c) The proposed method relaxes the requirement on model calibration in a "model-free" manner that enables robustness against modeling uncertainty and enhances the real-time performance via a data-driven RL algorithm. (d) The convergence of the IRL-based algorithms and the stability of the controlled traffic dynamics are proven via the Lyapunov theory. The optimal control law is parameterized and then approximated by neural networks (NN), which moderates the computational complexity. Both state and input constraints are considered while no model linearization is required. Numerical examples and simulation experiments are presented to verify the effectiveness and efficiency of the proposed method.