在不完整的信息和约束下，深入强化学习驱动的检查和维护计划

论文标题

在不完整的信息和约束下，深入强化学习驱动的检查和维护计划

Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints

论文作者

Andriotis, C. P., Papakonstantinou, K. G.

论文摘要

确定检查和维护政策，以最大程度地降低工程环境恶化的长期风险和成本构成一个复杂的优化问题。主要的计算挑战包括（i）维度的诅咒，这是由于状态/动作设定基本基础性与组件数量的指数缩放。（ii）历史的诅咒，与决策步骤的数量呈指数增长有关；（iii）存在固有环境随机性和检查/监测测量的变异性引起的状态不确定性；（iv）由于资源稀缺和其他不可行的/不良的系统响应，限制的存在，与随机长期局限性有关。在这项工作中，这些挑战是在约束可观察到的马尔可夫决策过程（POMDP）和多代理深度强化学习（DRL）的联合框架内解决的。 POMDP最佳铲球（II） - （III），将随机动态编程与贝叶斯推理原理相结合。多代理DRL地址（i），通过深度函数参数化和分散的控制假设。挑战（IV）是通过适当的州扩大和拉格朗日放松来处理的，重点是基于生命周期风险的约束和预算限制。提供了基本的算法步骤，并发现拟议的框架表现优于良好的政策基准，并促进在必须以最多的资源和风险意识的方式做出决策的情况下，促进了检查和干预措施的熟练处方。

Determination of inspection and maintenance policies for minimizing long-term risks and costs in deteriorating engineering environments constitutes a complex optimization problem. Major computational challenges include the (i) curse of dimensionality, due to exponential scaling of state/action set cardinalities with the number of components; (ii) curse of history, related to exponentially growing decision-trees with the number of decision-steps; (iii) presence of state uncertainties, induced by inherent environment stochasticity and variability of inspection/monitoring measurements; (iv) presence of constraints, pertaining to stochastic long-term limitations, due to resource scarcity and other infeasible/undesirable system responses. In this work, these challenges are addressed within a joint framework of constrained Partially Observable Markov Decision Processes (POMDP) and multi-agent Deep Reinforcement Learning (DRL). POMDPs optimally tackle (ii)-(iii), combining stochastic dynamic programming with Bayesian inference principles. Multi-agent DRL addresses (i), through deep function parametrizations and decentralized control assumptions. Challenge (iv) is herein handled through proper state augmentation and Lagrangian relaxation, with emphasis on life-cycle risk-based constraints and budget limitations. The underlying algorithmic steps are provided, and the proposed framework is found to outperform well-established policy baselines and facilitate adept prescription of inspection and intervention actions, in cases where decisions must be made in the most resource- and risk-aware manner.

下载PDF全文

下载文献需遵守相关版权规定

论文标题