论文标题

拒绝使用从任何歧视分类器得出的生成分类器的非法输入

Reject Illegal Inputs with Generative Classifier Derived from Any Discriminative Classifier

论文作者

Wang, Xin

论文摘要

已经显示出有望检测非法输入的生成分类器,包括对抗性示例和分发样本。有监督的深层信息(SDIM)是学习生成分类器的可扩展端到端框架。在本文中,我们提出了SDIM称为SDIM- \ emph {logit}的修改。 SDIM- \ emph {logit}并没有从头开始训练生成分类器,而是将产生任何给定的判别分类器产生的logits作为输入,并生成logit表示。然后,通过对logit表示形式施加统计约束来得出生成分类器。 sdim- \ emph {logit}可以继承判别分类器的性能而不会损失。 sdim- \ emph {logit}会产生可忽略的其他参数,并且可以通过固定的基本分类器进行有效训练。我们执行\ emph {用拒绝}分类},其中的测试样本小于预选的阈值,而没有预测,将被拒绝。对非法输入的实验,包括对抗性示例,常见损坏的样本和分布外〜(OOD)样品,显示允许拒绝一部分测试样本,SDIM- \ emph {logit}可显着提高左侧测试集的性能。

Generative classifiers have been shown promising to detect illegal inputs including adversarial examples and out-of-distribution samples. Supervised Deep Infomax~(SDIM) is a scalable end-to-end framework to learn generative classifiers. In this paper, we propose a modification of SDIM termed SDIM-\emph{logit}. Instead of training generative classifier from scratch, SDIM-\emph{logit} first takes as input the logits produced any given discriminative classifier, and generate logit representations; then a generative classifier is derived by imposing statistical constraints on logit representations. SDIM-\emph{logit} could inherit the performance of the discriminative classifier without loss. SDIM-\emph{logit} incurs a negligible number of additional parameters, and can be efficiently trained with base classifiers fixed. We perform \emph{classification with rejection}, where test samples whose class conditionals are smaller than pre-chosen thresholds will be rejected without predictions. Experiments on illegal inputs, including adversarial examples, samples with common corruptions, and out-of-distribution~(OOD) samples show that allowed to reject a portion of test samples, SDIM-\emph{logit} significantly improves the performance on the left test sets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源