通过对比度学习歧视说话者表示，在角度空间中具有班级感知的注意力

论文标题

通过对比度学习歧视说话者表示，在角度空间中具有班级感知的注意力

Discriminative Speaker Representation via Contrastive Learning with Class-Aware Attention in Angular Space

论文作者

Li, Zhe, Mak, Man-Wai, Meng, Helen Mei-Ling

论文摘要

将对比度学习应用于说话者验证（SV）的挑战在于，基于软马克斯的对比损失缺乏歧视力，而硬负面对很容易影响学习。为了克服第一个挑战，我们提出了一个对比度学习SV框架，该框架将添加剂边缘纳入了监督的对比损失，在该损失中，利润率提高了说话者代表的歧视能力。在第二个挑战中，我们引入了一种班级感知的注意机制，通过该机制，硬性阴性样本对监督对比损失的贡献较小。我们还采用了基于梯度的多目标优化来平衡分类和对比损失。 CN-CELEB和VOXCELEB1的实验结果表明，这个新的学习目标可能会导致编码器找到一个跨语言表现出很棒的扬声器歧视的嵌入式空间。

The challenges in applying contrastive learning to speaker verification (SV) are that the softmax-based contrastive loss lacks discriminative power and that the hard negative pairs can easily influence learning. To overcome the first challenge, we propose a contrastive learning SV framework incorporating an additive angular margin into the supervised contrastive loss in which the margin improves the speaker representation's discrimination ability. For the second challenge, we introduce a class-aware attention mechanism through which hard negative samples contribute less significantly to the supervised contrastive loss. We also employed gradient-based multi-objective optimization to balance the classification and contrastive loss. Experimental results on CN-Celeb and Voxceleb1 show that this new learning objective can cause the encoder to find an embedding space that exhibits great speaker discrimination across languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题