论文标题

聪明的汉斯效应在异常检测中

The Clever Hans Effect in Anomaly Detection

论文作者

Kauffmann, Jacob, Ruff, Lukas, Montavon, Grégoire, Müller, Klaus-Robert

论文摘要

当学习模型基于“错误”功能产生正确的预测时,就会发生“聪明的汉斯”效果。这种效果破坏了ML模型的概括能力,并且经常被标准验​​证技术检测到,以进行监督学习,其中训练算法在数据中利用了虚假的相关性。聪明的汉斯是否也出现在无监督的学习中,并且在哪种形式中几乎没有关注。因此,本文将贡献可解释的AI(XAI)程序,该程序可以突出不同类型的流行异常检测模型所使用的相关特征。我们的分析表明,聪明的汉斯效应在异常检测中广泛存在,并且以许多(意外)形式发生。有趣的是,在这种情况下,观察到的聪明的汉斯效应并不是因为数据而引起的,而是由于检测模型本身,其结构使他们无法检测到真正相关的功能,即使有大量数据点可用。总体而言,我们的工作对在实际应用中不受限制地使用现有的异常检测模型的使用有助于警告,但它也指出了一种可能摆脱聪明的汉斯困境的方法,特别是通过允许多个异常模型可以相互取消其个体的结构弱点,从而共同产生更好,更具可信赖的异常检测器。

The 'Clever Hans' effect occurs when the learned model produces correct predictions based on the 'wrong' features. This effect which undermines the generalization capability of an ML model and goes undetected by standard validation techniques has been frequently observed for supervised learning where the training algorithm leverages spurious correlations in the data. The question whether Clever Hans also occurs in unsupervised learning, and in which form, has received so far almost no attention. Therefore, this paper will contribute an explainable AI (XAI) procedure that can highlight the relevant features used by popular anomaly detection models of different type. Our analysis reveals that the Clever Hans effect is widespread in anomaly detection and occurs in many (unexpected) forms. Interestingly, the observed Clever Hans effects are in this case not so much due to the data, but due to the anomaly detection models themselves whose structure makes them unable to detect the truly relevant features, even though vast amounts of data points are available. Overall, our work contributes a warning against an unrestrained use of existing anomaly detection models in practical applications, but it also points at a possible way out of the Clever Hans dilemma, specifically, by allowing multiple anomaly models to mutually cancel their individual structural weaknesses to jointly produce a better and more trustworthy anomaly detector.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源