论文标题
具有自我诱导视觉变压器的可推广的工业视觉异常检测
Generalizable Industrial Visual Anomaly Detection with Self-Induction Vision Transformer
论文作者
论文摘要
工业视觉异常检测在先进的智能制造过程中起着至关重要的作用,而在这种情况下仍需要解决一些局限性。首先,现有的基于重建的方法与琐碎快捷方式的身份映射障碍,其中重建误差差距在正常样本和异常样本之间易读,从而导致劣等检测能力。然后,先前的研究主要集中在卷积神经网络(CNN)模型上,该模型捕获了对象的局部语义并忽略了全球环境,也导致了劣等的性能。此外,现有的研究遵循单个学习方式,在该方式中,检测模型只能使用产品的一个类别,而尚未探索多个类别的可推广检测。为了应对上述局限性,我们提出了一种自我诱导视觉变压器(SIVT),用于无监督可概括的多类工业视觉异常检测和本地化。拟议的SIVT首先提取了预先训练的CNN作为属性描述符的歧视性特征。然后,提出了自我诱导视觉变压器以自我探讨的方式重建提取的特征,此外还引入了辅助感应令牌来引起原始信号的语义。最后,使用语义特征残差差异可以检测到异常特性。我们对现有MVTEC AD基准测试的SIVT进行了实验,结果表明,该方法可以提高最先进的检测性能,而AUROC的2.8-6.3和AP中的3.3-7.6可以提高最新的检测性能。
Industrial vision anomaly detection plays a critical role in the advanced intelligent manufacturing process, while some limitations still need to be addressed under such a context. First, existing reconstruction-based methods struggle with the identity mapping of trivial shortcuts where the reconstruction error gap is legible between the normal and abnormal samples, leading to inferior detection capabilities. Then, the previous studies mainly concentrated on the convolutional neural network (CNN) models that capture the local semantics of objects and neglect the global context, also resulting in inferior performance. Moreover, existing studies follow the individual learning fashion where the detection models are only capable of one category of the product while the generalizable detection for multiple categories has not been explored. To tackle the above limitations, we proposed a self-induction vision Transformer(SIVT) for unsupervised generalizable multi-category industrial visual anomaly detection and localization. The proposed SIVT first extracts discriminatory features from pre-trained CNN as property descriptors. Then, the self-induction vision Transformer is proposed to reconstruct the extracted features in a self-supervisory fashion, where the auxiliary induction tokens are additionally introduced to induct the semantics of the original signal. Finally, the abnormal properties can be detected using the semantic feature residual difference. We experimented with the SIVT on existing Mvtec AD benchmarks, the results reveal that the proposed method can advance state-of-the-art detection performance with an improvement of 2.8-6.3 in AUROC, and 3.3-7.6 in AP.