较弱监督语义分割的类重新激活图

论文标题

较弱监督语义分割的类重新激活图

Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation

论文作者

Chen, Zhaozheng, Wang, Tan, Wu, Xiongwei, Hua, Xian-Sheng, Zhang, Hanwang, Sun, Qianru

论文摘要

提取类激活图（CAM）可以说是生成伪监督语义分割（WSSS）的伪面罩的最标准步骤。但是，我们发现不令人满意的伪口罩的症结是二进制跨透明镜损失（BCE），广泛用于CAM中。具体而言，由于BCE的汇总合并性质，CAM中的每个像素可能对同一接受场中同时发生的多个类别响应。结果，给定一个类，其热凸轮像素可能会错误地侵入属于其他类的区域，或者非热门类别实际上可能是该类的一部分。为此，我们引入了一种令人尴尬的简单但令人惊讶的有效方法：通过使用SoftMax跨透镜损失（SCE），将CAM与BCE重新激活，称为\ textbf {recam}。给定图像，我们使用CAM来提取每个单个类的特征像素，并将它们与类标签一起使用，以使用SCE学习另一个完全连接的层（在骨架之后）。收敛后，我们以与CAM中的方式提取败诉。由于SCE的对比性质，像素响应被分为不同的类别，因此预期的掩盖歧义更少。对Pascal VOC和MS〜COCO的评估都表明，记录不仅会产生高质量的口罩，而且还支持在任何凸轮变体中的插件和戏剧性，而没有开销。

Extracting class activation maps (CAM) is arguably the most standard step of generating pseudo masks for weakly-supervised semantic segmentation (WSSS). Yet, we find that the crux of the unsatisfactory pseudo masks is the binary cross-entropy loss (BCE) widely used in CAM. Specifically, due to the sum-over-class pooling nature of BCE, each pixel in CAM may be responsive to multiple classes co-occurring in the same receptive field. As a result, given a class, its hot CAM pixels may wrongly invade the area belonging to other classes, or the non-hot ones may be actually a part of the class. To this end, we introduce an embarrassingly simple yet surprisingly effective method: Reactivating the converged CAM with BCE by using softmax cross-entropy loss (SCE), dubbed \textbf{ReCAM}. Given an image, we use CAM to extract the feature pixels of each single class, and use them with the class label to learn another fully-connected layer (after the backbone) with SCE. Once converged, we extract ReCAM in the same way as in CAM. Thanks to the contrastive nature of SCE, the pixel response is disentangled into different classes and hence less mask ambiguity is expected. The evaluation on both PASCAL VOC and MS~COCO shows that ReCAM not only generates high-quality masks, but also supports plug-and-play in any CAM variant with little overhead.

下载PDF全文

下载文献需遵守相关版权规定

论文标题