在分离框架中，类内部的差异降低说话者表示

论文标题

在分离框架中，类内部的差异降低说话者表示

Intra-class variation reduction of speaker representation in disentanglement framework

论文作者

Kwon, Yoohwan, Chung, Soo-Whan, Kang, Hong-Goo

论文摘要

在本文中，我们提出了一种有效的培训策略，以从语音信号中获得强大的说话者表示。在说话者识别任务中，关键挑战的一个是学习贴词表示或嵌入仅包含扬声器信息的信息，以便在言论中的变化方面保持稳健。通过修改网络体系结构可以切换与说话者相关的和说话者无关的代表，我们利用了一个学习标准，该标准可以最大程度地减少这些分离的嵌入之间的Mu-Tual信息。 Wealso介绍了一个身份变化损失标准，该标准将静脉误差利用为Samespeaker所说的不同话语。由于所提出的标准减少了由背景环境或口语内容变化引起的宣言特征的变化，因此每个孔的嵌入者的嵌入变得更加一致。亲态方法的有效性通过两个任务证明。与基准数据集voxceleb1上的基线模型相比，分离态性能和说话者识别式屏幕截图的改善。消融研究还表明了每个Cri-terion对整体表现的影响。

In this paper, we propose an effective training strategy to ex-tract robust speaker representations from a speech signal. Oneof the key challenges in speaker recognition tasks is to learnlatent representations or embeddings containing solely speakercharacteristic information in order to be robust in terms of intra-speaker variations. By modifying the network architecture togenerate both speaker-related and speaker-unrelated representa-tions, we exploit a learning criterion which minimizes the mu-tual information between these disentangled embeddings. Wealso introduce an identity change loss criterion which utilizes areconstruction error to different utterances spoken by the samespeaker. Since the proposed criteria reduce the variation ofspeaker characteristics caused by changes in background envi-ronment or spoken content, the resulting embeddings of eachspeaker become more consistent. The effectiveness of the pro-posed method is demonstrated through two tasks; disentangle-ment performance, and improvement of speaker recognition ac-curacy compared to the baseline model on a benchmark dataset,VoxCeleb1. Ablation studies also show the impact of each cri-terion on overall performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题