论文标题
一次一个词:对检索模型的对抗性攻击
One word at a time: adversarial attacks on retrieval models
论文作者
论文摘要
通过对输入特征应用小的扰动生成的对抗性示例被广泛用于欺骗分类器并测量其对嘈杂输入的鲁棒性。但是,几乎没有完成通过对抗性示例评估排名模型的鲁棒性的工作。在这项工作中,我们提出了一种系统的方法来利用对抗性示例来衡量流行排名模型的鲁棒性。我们探索了一种简单的方法来生成对抗性示例,该示例迫使排名错误对文档进行错误的排名。使用这种方法,我们分析了各种排名模型的鲁棒性以及对抗攻击者在两个数据集中产生的扰动质量。我们的发现表明,攻击者几乎没有令牌更改(1-3),可以产生语义上相似的扰动文档,这些文档可以使不同的排名者更改文档的分数,从而将其排名降低了几个位置。
Adversarial examples, generated by applying small perturbations to input features, are widely used to fool classifiers and measure their robustness to noisy inputs. However, little work has been done to evaluate the robustness of ranking models through adversarial examples. In this work, we present a systematic approach of leveraging adversarial examples to measure the robustness of popular ranking models. We explore a simple method to generate adversarial examples that forces a ranker to incorrectly rank the documents. Using this approach, we analyze the robustness of various ranking models and the quality of perturbations generated by the adversarial attacker across two datasets. Our findings suggest that with very few token changes (1-3), the attacker can yield semantically similar perturbed documents that can fool different rankers into changing a document's score, lowering its rank by several positions.