关于人类对对抗性示例的评估

论文标题

关于人类对对抗性示例的评估

On the human evaluation of audio adversarial examples

论文作者

Vadillo, Jon, Santana, Roberto

论文摘要

人机相互作用越来越依赖语音交流。机器学习模型通常用于解释人类语音命令。但是，这些模型可以被对抗性示例所欺骗，这些示例是故意扰动以产生错误预测而没有注意到的输入。尽管许多研究集中在开发新技术来产生对抗性扰动上，但对确定人类是否注意到扰动的方面的关注较少。这个问题很重要，因为只有在无法检测到扰动的情况下，拟议的对抗扰动策略的高愚蠢率才有价值。在本文中，我们研究了文献中针对对抗性示例提出的失真指标，并且通常用于评估产生这些攻击的方法的有效性，这是对人类对扰动的人类看法的可靠度量。使用一个分析框架和一个实验，其中18名受试者评估了音频对抗性示例，我们证明了约定的指标并不是对音频域中对抗性示例的感知相似性的可靠度量。

Human-machine interaction is increasingly dependent on speech communication. Machine Learning models are usually applied to interpret human speech commands. However, these models can be fooled by adversarial examples, which are inputs intentionally perturbed to produce a wrong prediction without being noticed. While much research has been focused on developing new techniques to generate adversarial perturbations, less attention has been given to aspects that determine whether and how the perturbations are noticed by humans. This question is relevant since high fooling rates of proposed adversarial perturbation strategies are only valuable if the perturbations are not detectable. In this paper we investigate to which extent the distortion metrics proposed in the literature for audio adversarial examples, and which are commonly applied to evaluate the effectiveness of methods for generating these attacks, are a reliable measure of the human perception of the perturbations. Using an analytical framework, and an experiment in which 18 subjects evaluate audio adversarial examples, we demonstrate that the metrics employed by convention are not a reliable measure of the perceptual similarity of adversarial examples in the audio domain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题