论文标题
塔瓦特:象征性的虚拟对手培训语言理解
TAVAT: Token-Aware Virtual Adversarial Training for Language Understanding
论文作者
论文摘要
基于梯度的对抗训练被广泛用于改善神经网络的鲁棒性,而由于嵌入空间是离散的,因此不能轻易适应自然语言处理任务。在自然语言处理字段中,由于文本是离散的,并且不能直接受到梯度的干扰,因此引入了虚拟对抗训练。另外,在NLP任务中介绍了在嵌入空间上产生扰动的虚拟对抗训练。尽管取得了成功,但现有的虚拟对抗训练方法仍会产生受Frobenius归一化球约束的扰动。为了制作细粒度的扰动,我们提出了一种令牌意识的虚拟对抗训练方法。我们引入了令牌级的累积扰动词汇,以更好地初始化扰动并使用令牌级别的标准化球来约束这些扰动。实验表明,我们的方法通过相当大的余量提高了在各种任务中伯特和阿尔伯特等预训练模型的性能。所提出的方法使用BERT模型将胶水基准的得分从78.3提高到80.9,并且还提高了序列标签和文本分类任务的性能。
Gradient-based adversarial training is widely used in improving the robustness of neural networks, while it cannot be easily adapted to natural language processing tasks since the embedding space is discrete. In natural language processing fields, virtual adversarial training is introduced since texts are discrete and cannot be perturbed by gradients directly. Alternatively, virtual adversarial training, which generates perturbations on the embedding space, is introduced in NLP tasks. Despite its success, existing virtual adversarial training methods generate perturbations roughly constrained by Frobenius normalization balls. To craft fine-grained perturbations, we propose a Token-Aware Virtual Adversarial Training method. We introduce a token-level accumulated perturbation vocabulary to initialize the perturbations better and use a token-level normalization ball to constrain these perturbations pertinently. Experiments show that our method improves the performance of pre-trained models such as BERT and ALBERT in various tasks by a considerable margin. The proposed method improves the score of the GLUE benchmark from 78.3 to 80.9 using BERT model and it also enhances the performance of sequence labeling and text classification tasks.