论文标题
自动故障树从连续价值传感器数据中学习:家庭加热器的案例研究
Automated fault tree learning from continuous-valued sensor data: a case study on domestic heaters
论文作者
论文摘要
许多工业部门都在收集大传感器数据。借助最近用于处理大数据的技术,公司可以利用此功能来自动故障检测和预防。我们提出了第一种完全自动化的方法进行故障分析,来自具有连续变量的原始观察数据中的机器学习故障树。我们的方法尺度很好,并在荷兰的国内加热器操作的现实世界中进行了测试,其中有3100万个独特的加热器读数,每个读数包含27个传感器和11个失败变量。我们的方法基于之前的两个过程:C4.5决策树学习算法和升力故障树从布尔数据中学习算法。 C4.5预处理每个连续变量:它学习了一个最佳数值阈值,该阈值区分了顶级系统的错误和正常操作。这些阈值可以离散变量,从而使升降机学习了故障树,该树对系统的根部故障机制进行了建模并可以解释。我们获得了11个故障变量的故障树,并通过两种方式进行评估:具有显着性评分,并且在定性上与域专家进行评估。一些学到的断层树几乎具有最大的意义(高于0.95),而另一些则具有中低的意义(左右0.30),这反映了从大型,嘈杂,现实世界中的知识中学习的困难。域专家确认,断层树模型变量之间有意义的关系。
Many industrial sectors have been collecting big sensor data. With recent technologies for processing big data, companies can exploit this for automatic failure detection and prevention. We propose the first completely automated method for failure analysis, machine-learning fault trees from raw observational data with continuous variables. Our method scales well and is tested on a real-world, five-year dataset of domestic heater operations in The Netherlands, with 31 million unique heater-day readings, each containing 27 sensor and 11 failure variables. Our method builds on two previous procedures: the C4.5 decision-tree learning algorithm, and the LIFT fault tree learning algorithm from Boolean data. C4.5 pre-processes each continuous variable: it learns an optimal numerical threshold which distinguishes between faulty and normal operation of the top-level system. These thresholds discretise the variables, thus allowing LIFT to learn fault trees which model the root failure mechanisms of the system and are explainable. We obtain fault trees for the 11 failure variables, and evaluate them in two ways: quantitatively, with a significance score, and qualitatively, with domain specialists. Some of the fault trees learnt have almost maximum significance (above 0.95), while others have medium-to-low significance (around 0.30), reflecting the difficulty of learning from big, noisy, real-world sensor data. The domain specialists confirm that the fault trees model meaningful relationships among the variables.