所多玛的苹果：通过对比学习中的上级句子嵌入中隐藏的后门

论文标题

所多玛的苹果：通过对比学习中的上级句子嵌入中隐藏的后门

Apple of Sodom: Hidden Backdoors in Superior Sentence Embeddings via Contrastive Learning

论文作者

Chen, Xiaoyi, Xin, Baisong, Zhai, Shengfang, Ma, Shiqing, Shen, Qingni, Wu, Zhonghai

论文摘要

本文发现，对比学习可以为预训练的模型产生较高的句子嵌入，但也容易受到后门攻击的影响。我们介绍了第一个后门攻击框架BADCSE，用于监督和无监督的学习设置中的最新句子嵌入。攻击操纵正面和负对的构造，以使后门样品与目标样本（目标攻击）或其干净版本的负嵌入（非目标攻击）具有相似的嵌入。通过将后门注入句子嵌入中，Badcse可以抵抗下游微调。我们在两个STS任务和其他下游任务上评估BADCSE。受监管的非目标攻击获得了194.86％的性能降解，而目标攻击将后do的样品映射到目标嵌入，并以97.70％的成功率在维持模型实用程序的同时。

This paper finds that contrastive learning can produce superior sentence embeddings for pre-trained models but is also vulnerable to backdoor attacks. We present the first backdoor attack framework, BadCSE, for state-of-the-art sentence embeddings under supervised and unsupervised learning settings. The attack manipulates the construction of positive and negative pairs so that the backdoored samples have a similar embedding with the target sample (targeted attack) or the negative embedding of its clean version (non-targeted attack). By injecting the backdoor in sentence embeddings, BadCSE is resistant against downstream fine-tuning. We evaluate BadCSE on both STS tasks and other downstream tasks. The supervised non-targeted attack obtains a performance degradation of 194.86%, and the targeted attack maps the backdoored samples to the target embedding with a 97.70% success rate while maintaining the model utility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题