论文标题
用于整个幻灯片图像分类的图形转换器
A graph-transformer for whole slide image classification
论文作者
论文摘要
深度学习是整个幻灯片图像(WSI)分析的强大工具。通常,在进行监督深度学习时,WSI分为小斑块,训练有素,结果汇总以估计疾病等级。但是,基于斑块的方法在训练过程中引入了标签噪声,假设每个贴片都独立于与WSI相同的标签,并且忽略了在疾病分级重要的WSI-Level信息。在这里,我们提出了一个图形转换器(GT),该图形融合了WSI的基于图的表示和视力变压器用于处理病理图像(称为GTP)以预测疾病等级的图像。我们从临床蛋白质组学肿瘤分析联盟(CPTAC),国家肺筛查试验(NLST)和癌症基因组地图集(TCGA)(TCGA)中选择了$ 4,818 $ WSIS,并使用GTP区分了腺癌(LUAD)和鳞状细胞瘤(LSCC)(LSCC)(LSCC)脱离辅助组织(LUAD)。首先,使用NLST数据,我们开发了一个对比度学习框架来生成特征提取器。这使我们能够计算单个WSI贴片的特征向量,这些贴片用于表示图的节点,然后构建GTP框架。我们对CPTAC数据进行培训的模型在三个标签分类方面始终获得高性能(正常与LUAD与LSCC:平均准确度$ = 91.2 $ $ \ pm $ $ $ $ $ $ 2.5 \%\%$)基于五倍的交叉录音,并且平均精度$ = 82.3 $ $ $ $ $ $ $ \ $ $ \ $ $ \ $ $ \ $ \ $ 1.0 \%\%$ $ \%$ $ $ $ \%$ $ $ \%$ $ $ $ $ $ $ $ $ $ $ $在外部测试数据(TCGA)。我们还引入了一种基于图形的显着映射技术,称为GraphCam,该技术可以识别与类标签高度相关的区域。我们的发现表明GTP是WSI级分类的可解释和有效的深度学习框架。
Deep learning is a powerful tool for whole slide image (WSI) analysis. Typically, when performing supervised deep learning, a WSI is divided into small patches, trained and the outcomes are aggregated to estimate disease grade. However, patch-based methods introduce label noise during training by assuming that each patch is independent with the same label as the WSI and neglect overall WSI-level information that is significant in disease grading. Here we present a Graph-Transformer (GT) that fuses a graph-based representation of an WSI and a vision transformer for processing pathology images, called GTP, to predict disease grade. We selected $4,818$ WSIs from the Clinical Proteomic Tumor Analysis Consortium (CPTAC), the National Lung Screening Trial (NLST), and The Cancer Genome Atlas (TCGA), and used GTP to distinguish adenocarcinoma (LUAD) and squamous cell carcinoma (LSCC) from adjacent non-cancerous tissue (normal). First, using NLST data, we developed a contrastive learning framework to generate a feature extractor. This allowed us to compute feature vectors of individual WSI patches, which were used to represent the nodes of the graph followed by construction of the GTP framework. Our model trained on the CPTAC data achieved consistently high performance on three-label classification (normal versus LUAD versus LSCC: mean accuracy$= 91.2$ $\pm$ $2.5\%$) based on five-fold cross-validation, and mean accuracy $= 82.3$ $\pm$ $1.0\%$ on external test data (TCGA). We also introduced a graph-based saliency mapping technique, called GraphCAM, that can identify regions that are highly associated with the class label. Our findings demonstrate GTP as an interpretable and effective deep learning framework for WSI-level classification.