扬声器向量的深度归一化

论文标题

扬声器向量的深度归一化

Deep Normalization for Speaker Vectors

论文作者

Cai, Yunqi, Li, Lantian, Wang, Dong, Abel, Andrew

论文摘要

Deep Speaker嵌入已显示出在说话者识别任务中的最先进性能。但是，这种方法的一个潜在问题是，源自深层嵌入模型的说话者向量对于每个说话者而言往往是非高斯的，而对于不同说话者的分布来说，无均匀的媒介。这些不规则的分布会严重影响扬声器识别性能，尤其是使用流行的PLDA评分方法，该方法假设高斯分布。在本文中，我们认为深扬声器矢量需要深层归一化，并根据新颖的判别归一化流量（DNF）模型提出了深层归一化方法。我们使用广泛使用的SITW和CNCELEB COLPORA的实验证明了拟议方法的有效性。在这些实验中，基于DNF的归一化可带来可观的性能增长，并且在室外测试中也显示出强大的概括能力。

Deep speaker embedding has demonstrated state-of-the-art performance in speaker recognition tasks. However, one potential issue with this approach is that the speaker vectors derived from deep embedding models tend to be non-Gaussian for each individual speaker, and non-homogeneous for distributions of different speakers. These irregular distributions can seriously impact speaker recognition performance, especially with the popular PLDA scoring method, which assumes homogeneous Gaussian distribution. In this paper, we argue that deep speaker vectors require deep normalization, and propose a deep normalization approach based on a novel discriminative normalization flow (DNF) model. We demonstrate the effectiveness of the proposed approach with experiments using the widely used SITW and CNCeleb corpora. In these experiments, the DNF-based normalization delivered substantial performance gains and also showed strong generalization capability in out-of-domain tests.

下载PDF全文

下载文献需遵守相关版权规定

论文标题