论文标题
快速标签方法可靠吗?对显微镜幻灯片的计算机辅助专家注释的案例研究
Are fast labeling methods reliable? A case study of computer-aided expert annotations on microscopy slides
论文作者
论文摘要
基于深度学习的管道表明,通过向训练有素的病理学专家提供视觉增强来革新显微镜图像诊断的潜力。但是,为了匹配人类的绩效,这些方法依赖于大量高品质标签数据的可用性,这构成了重大挑战。为了避免这种情况,增强的标签方法(也称为专家 - 算法 - 合同)最近变得流行。但是,这种操作模式引入的潜在偏见及其对训练神经元网络的影响尚不完全了解。这项工作旨在通过为三个病理相关的诊断环境提供案例研究来阐明某些效果。十个训练有素的病理专家首先执行了标签任务,而没有计算机生成的增强。为了研究不同的偏见效应,我们故意将错误引入了增强。此外,我们开发了一种新颖的损失功能,该功能将专家的注释共识纳入了深度学习分类器的培训。在这项新颖的注释研究中,总共病理专家在1200张图像上注释了26,015个细胞。在这个广泛的数据集的支持下,我们发现在计算机辅助设置中,多位专家的共识和深度学习分类器的精度显着提高,而不是无助的注释。但是,专家未确定大量故意引入的虚假标签。此外,我们表明我们的损失功能从多位专家中获利,并且表现优于常规损失功能。同时,系统错误并没有导致训练有素的分类器精度恶化。此外,一个经过计算机辅助支持的专家的注释培训的分类器可以胜过最多九个专家的组合注释。
Deep-learning-based pipelines have shown the potential to revolutionalize microscopy image diagnostics by providing visual augmentations to a trained pathology expert. However, to match human performance, the methods rely on the availability of vast amounts of high-quality labeled data, which poses a significant challenge. To circumvent this, augmented labeling methods, also known as expert-algorithm-collaboration, have recently become popular. However, potential biases introduced by this operation mode and their effects for training neuronal networks are not entirely understood. This work aims to shed light on some of the effects by providing a case study for three pathologically relevant diagnostic settings. Ten trained pathology experts performed a labeling tasks first without and later with computer-generated augmentation. To investigate different biasing effects, we intentionally introduced errors to the augmentation. Furthermore, we developed a novel loss function which incorporates the experts' annotation consensus in the training of a deep learning classifier. In total, the pathology experts annotated 26,015 cells on 1,200 images in this novel annotation study. Backed by this extensive data set, we found that the consensus of multiple experts and the deep learning classifier accuracy, was significantly increased in the computer-aided setting, versus the unaided annotation. However, a significant percentage of the deliberately introduced false labels was not identified by the experts. Additionally, we showed that our loss function profited from multiple experts and outperformed conventional loss functions. At the same time, systematic errors did not lead to a deterioration of the trained classifier accuracy. Furthermore, a classifier trained with annotations from a single expert with computer-aided support can outperform the combined annotations from up to nine experts.