拒绝使用从任何歧视分类器得出的生成分类器的非法输入

论文标题

拒绝使用从任何歧视分类器得出的生成分类器的非法输入

Reject Illegal Inputs with Generative Classifier Derived from Any Discriminative Classifier

论文作者

Wang, Xin

论文摘要

已经显示出有望检测非法输入的生成分类器，包括对抗性示例和分发样本。有监督的深层信息（SDIM）是学习生成分类器的可扩展端到端框架。在本文中，我们提出了SDIM称为SDIM- \ emph {logit}的修改。 SDIM- \ emph {logit}并没有从头开始训练生成分类器，而是将产生任何给定的判别分类器产生的logits作为输入，并生成logit表示。然后，通过对logit表示形式施加统计约束来得出生成分类器。 sdim- \ emph {logit}可以继承判别分类器的性能而不会损失。 sdim- \ emph {logit}会产生可忽略的其他参数，并且可以通过固定的基本分类器进行有效训练。我们执行\ emph {用拒绝}分类}，其中的测试样本小于预选的阈值，而没有预测，将被拒绝。对非法输入的实验，包括对抗性示例，常见损坏的样本和分布外〜（OOD）样品，显示允许拒绝一部分测试样本，SDIM- \ emph {logit}可显着提高左侧测试集的性能。

Generative classifiers have been shown promising to detect illegal inputs including adversarial examples and out-of-distribution samples. Supervised Deep Infomax~(SDIM) is a scalable end-to-end framework to learn generative classifiers. In this paper, we propose a modification of SDIM termed SDIM-\emph{logit}. Instead of training generative classifier from scratch, SDIM-\emph{logit} first takes as input the logits produced any given discriminative classifier, and generate logit representations; then a generative classifier is derived by imposing statistical constraints on logit representations. SDIM-\emph{logit} could inherit the performance of the discriminative classifier without loss. SDIM-\emph{logit} incurs a negligible number of additional parameters, and can be efficiently trained with base classifiers fixed. We perform \emph{classification with rejection}, where test samples whose class conditionals are smaller than pre-chosen thresholds will be rejected without predictions. Experiments on illegal inputs, including adversarial examples, samples with common corruptions, and out-of-distribution~(OOD) samples show that allowed to reject a portion of test samples, SDIM-\emph{logit} significantly improves the performance on the left test sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题