野外扬声器验证的单个通道远场功能增强功能

论文标题

野外扬声器验证的单个通道远场功能增强功能

Single Channel Far Field Feature Enhancement For Speaker Verification In The Wild

论文作者

Nidadavolu, Phani Sankar, Kataria, Saurabh, García-Perera, Paola, Villalba, Jesús, Dehak, Najim

论文摘要

我们研究了一种增强和域的适应方法，以使说话者验证系统可靠地扰动远场语音。在增强方法中，使用配对（平行）的Reverberant-Clean语音，我们训练了有监督的生成对抗网络（GAN）以及功能映射损失。对于域适应方法，我们训练了一个循环一致的生成对抗网络（Cyclegan），该网络映射从远场域到嵌入训练域的扬声器。这是以无监督的方式对未配对数据进行培训。这两个网络分别被称为受监督的增强网络（SEN）和域自适应网络（DAN），均在（滤波器 - 银行）特征域中培训了多任务目标。在模拟测试设置中，我们首先注意到使用功能映射（FM）损失以及SEN中的对抗性损失的好处。然后，我们在几个真正的嘈杂数据集上测试了受监督和无监督的方法。我们观察到的相对改善在DCF方面从2％到31％。使用三种培训方案，我们还建立了新颖的DAN方法的有效性。

We investigated an enhancement and a domain adaptation approach to make speaker verification systems robust to perturbations of far-field speech. In the enhancement approach, using paired (parallel) reverberant-clean speech, we trained a supervised Generative Adversarial Network (GAN) along with a feature mapping loss. For the domain adaptation approach, we trained a Cycle Consistent Generative Adversarial Network (CycleGAN), which maps features from far-field domain to the speaker embedding training domain. This was trained on unpaired data in an unsupervised manner. Both networks, termed Supervised Enhancement Network (SEN) and Domain Adaptation Network (DAN) respectively, were trained with multi-task objectives in (filter-bank) feature domain. On a simulated test setup, we first note the benefit of using feature mapping (FM) loss along with adversarial loss in SEN. Then, we tested both supervised and unsupervised approaches on several real noisy datasets. We observed relative improvements ranging from 2% to 31% in terms of DCF. Using three training schemes, we also establish the effectiveness of the novel DAN approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题