汽车：语义细分的班级意识正规化

论文标题

汽车：语义细分的班级意识正规化

CAR: Class-aware Regularizations for Semantic Segmentation

论文作者

Huang, Ye, Kang, Di, Chen, Liang, Zhe, Xuefei, Jia, Wenjing, He, Xiangjian, Bao, Linchao

论文摘要

除了像素功能之外，还利用“类级”信息的最新细分方法（例如OCR和CPNET）在提高现有网络模块的准确性方面取得了显着的成功。但是，提取的类级信息简单地与像素功能相连，而无需明确利用以获得更好的像素表示学习。此外，这些方法基于粗蒙版预测来学习软类中心，这很容易积累错误。在本文中，旨在更有效地使用班级信息，我们提出了一种普遍的班级感知正规化（CAR）方法，以优化特征学习过程中的类内差异和类间距离，这是由于人类本身可以识别对象的事实，无论其出现哪种其他对象。提出了三个新颖的损失功能。第一个损失函数鼓励每个类中更紧凑的类表示，第二个损失函数直接最大化了不同类中心之间的距离，第三个进一步推动了班级间中心和像素之间的距离。此外，我们方法中的班级中心是由地面真理直接产生的，而不是从容易出错的粗糙预测中产生。我们的方法可以轻松地应用于包括OCR和CPNET在内的大多数现有分割模型，并且在没有其他推理开销的情况下可以在很大程度上提高其准确性。在多个基准数据集上进行的广泛实验和消融研究表明，所提出的汽车可以提高所有基线模型的准确性，最多可提高2.23％MIOU，具有出色的概括能力。完整的代码可在https://github.com/edwardyehuang/car上找到。

Recent segmentation methods, such as OCR and CPNet, utilizing "class level" information in addition to pixel features, have achieved notable success for boosting the accuracy of existing network modules. However, the extracted class-level information was simply concatenated to pixel features, without explicitly being exploited for better pixel representation learning. Moreover, these approaches learn soft class centers based on coarse mask prediction, which is prone to error accumulation. In this paper, aiming to use class level information more effectively, we propose a universal Class-Aware Regularization (CAR) approach to optimize the intra-class variance and inter-class distance during feature learning, motivated by the fact that humans can recognize an object by itself no matter which other objects it appears with. Three novel loss functions are proposed. The first loss function encourages more compact class representations within each class, the second directly maximizes the distance between different class centers, and the third further pushes the distance between inter-class centers and pixels. Furthermore, the class center in our approach is directly generated from ground truth instead of from the error-prone coarse prediction. Our method can be easily applied to most existing segmentation models during training, including OCR and CPNet, and can largely improve their accuracy at no additional inference overhead. Extensive experiments and ablation studies conducted on multiple benchmark datasets demonstrate that the proposed CAR can boost the accuracy of all baseline models by up to 2.23% mIOU with superior generalization ability. The complete code is available at https://github.com/edwardyehuang/CAR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题