机会限制了过程控制和优化的策略优化

论文标题

机会限制了过程控制和优化的策略优化

Chance Constrained Policy Optimization for Process Control and Optimization

论文作者

Petsagkourakis, Panagiotis, Sandoval, Ilya Orson, Bradford, Eric, Galvanin, Federico, Zhang, Dongda, del Rio-Chanona, Ehecatl Antonio

论文摘要

化学过程优化和控制受1）植物模型不匹配，2）过程干扰和3）安全操作的限制。通过政策优化学习的强化学习将是一种自然的方法来解决此问题，因为它能够解决随机性，植物模型不匹配，并直接以适当的闭环方式来解决未来不确定性及其反馈的影响；无需内部优化循环。在工业流程（或几乎任何工程应用程序）中未考虑强化学习的主要原因之一是它缺乏处理安全关键限制的框架。当前的策略优化算法使用难以调整的罚款参数，仅在期望中可靠地满足状态约束或当前保证。我们提出了偶然的限制政策优化（CCPO）算法，该算法保证了具有很高概率的联合机会限制的满意度 - 这对于安全关键任务至关重要。这是通过引入约束收紧（退缩）来实现的，后者是根据反馈策略同时计算的。使用概率约束的经验累积分布函数对贝叶斯优化进行了调整，因此进行了自我调整。这导致了一种通用方法，可以将其浸入当前的策略优化算法中，以使它们能够以很高的概率满足关节机会约束。我们提出了分析提出方法的性能的案例研究。

Chemical process optimization and control are affected by 1) plant-model mismatch, 2) process disturbances, and 3) constraints for safe operation. Reinforcement learning by policy optimization would be a natural way to solve this due to its ability to address stochasticity, plant-model mismatch, and directly account for the effect of future uncertainty and its feedback in a proper closed-loop manner; all without the need of an inner optimization loop. One of the main reasons why reinforcement learning has not been considered for industrial processes (or almost any engineering application) is that it lacks a framework to deal with safety critical constraints. Present algorithms for policy optimization use difficult-to-tune penalty parameters, fail to reliably satisfy state constraints or present guarantees only in expectation. We propose a chance constrained policy optimization (CCPO) algorithm which guarantees the satisfaction of joint chance constraints with a high probability - which is crucial for safety critical tasks. This is achieved by the introduction of constraint tightening (backoffs), which are computed simultaneously with the feedback policy. Backoffs are adjusted with Bayesian optimization using the empirical cumulative distribution function of the probabilistic constraints, and are therefore self-tuned. This results in a general methodology that can be imbued into present policy optimization algorithms to enable them to satisfy joint chance constraints with high probability. We present case studies that analyze the performance of the proposed approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题