域不变的扬声器矢量投影通过模型不合时宜的元学习

论文标题

域不变的扬声器矢量投影通过模型不合时宜的元学习

Domain-Invariant Speaker Vector Projection by Model-Agnostic Meta-Learning

论文作者

Kang, Jiawen, Liu, Ruiqi, Li, Lantian, Cai, Yunqi, Wang, Dong, Zheng, Thomas Fang

论文摘要

即使使用基于深神网的最先进的体系结构，域的概括仍然是说话者识别的关键问题。例如，当应用于唱歌或电影的场景时，接受阅读演讲的模型可能会在很大程度上失败。在本文中，我们提出了一个域不变的投影，以提高说话者向量的普遍性。该投影是一个简单的神经网络，经过模型不合时宜的元学习（MAML）原理进行训练，如果在另一个域中使用语音数据更新，则目的是将其分类为一个域中的说话者。我们在CNCELEB上测试了提出的方法，CNCELEB是一个由单扬声器多条件（SSMC）数据组成的新数据集。结果表明，基于MAML的域不变投影可以产生更具概括的扬声器向量，并有效地改善看不见的域中的性能。

Domain generalization remains a critical problem for speaker recognition, even with the state-of-the-art architectures based on deep neural nets. For example, a model trained on reading speech may largely fail when applied to scenarios of singing or movie. In this paper, we propose a domain-invariant projection to improve the generalizability of speaker vectors. This projection is a simple neural net and is trained following the Model-Agnostic Meta-Learning (MAML) principle, for which the objective is to classify speakers in one domain if it had been updated with speech data in another domain. We tested the proposed method on CNCeleb, a new dataset consisting of single-speaker multi-condition (SSMC) data. The results demonstrated that the MAML-based domain-invariant projection can produce more generalizable speaker vectors, and effectively improve the performance in unseen domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题