论文标题
用于完整上下文和高分辨率3D医疗细分的内存变压器
Memory transformers for full context and high-resolution 3D Medical Segmentation
论文作者
论文摘要
变压器模型获得图像分割的最新结果。但是,通过高分辨率3D图像获得捕捉全球环境所必需的远程关注是一个基本挑战。本文介绍了完整的分辨率内存(FINE)变压器来克服此问题。罚款背后的核心思想是学习内存令牌以间接建模全范围交互,同时在内存和计算成本方面很好地扩展。 Fine引入了两个级别的内存令牌:第一个级别允许在本地图像区域内的体素之间进行完全相互作用(补丁),第二个级别允许在3D卷的所有区域之间进行完全交互。结合在一起,它们可以全神贯注于高分辨率图像,例如512 x 512 x 256素及以上。 BCV图像分割数据集的实验比最新的CNN和变压器基线显示出更好的性能,与最近的变压器基线相比,强调了我们全部注意力机制的优越性,例如cotr和nnformer。
Transformer models achieve state-of-the-art results for image segmentation. However, achieving long-range attention, necessary to capture global context, with high-resolution 3D images is a fundamental challenge. This paper introduces the Full resolutIoN mEmory (FINE) transformer to overcome this issue. The core idea behind FINE is to learn memory tokens to indirectly model full range interactions while scaling well in both memory and computational costs. FINE introduces memory tokens at two levels: the first one allows full interaction between voxels within local image regions (patches), the second one allows full interactions between all regions of the 3D volume. Combined, they allow full attention over high resolution images, e.g. 512 x 512 x 256 voxels and above. Experiments on the BCV image segmentation dataset shows better performances than state-of-the-art CNN and transformer baselines, highlighting the superiority of our full attention mechanism compared to recent transformer baselines, e.g. CoTr, and nnFormer.