论文标题
IDLAB VOXCELEB扬声器识别挑战2020系统描述
The IDLAB VoxCeleb Speaker Recognition Challenge 2020 System Description
论文作者
论文摘要
在这份技术报告中,我们描述了Voxceleb演讲者识别挑战2020(VoxSRC-20)的IDLAB最高得分提交,并在受监督和无监督的说话者验证轨道上。对于监督验证轨道,我们培训了6个最先进的ECAPA-TDNN系统和4个具有建筑变化的基于Resnet34的系统。在所有模型上,我们都采用了较大的保证金微调策略,该策略使培训程序能够通过使用更长的训练话语来使用更高的保证金罚款。此外,我们还使用质量感知的分数校准,该评分校准在校准系统中引入质量指标来在不同级别的话语条件下产生更一致的分数。所有系统融合都融合了两种增强功能,都在开放和封闭的监督验证轨道上获得了第一名。通过对比度学习对无监督的系统进行了训练。随后通过训练嵌入的迭代聚类来生成伪标签,允许使用监督技术。该程序导致了无监督的轨道上的获胜提交,其表现正在关闭,以监督培训。
In this technical report we describe the IDLAB top-scoring submissions for the VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20) in the supervised and unsupervised speaker verification tracks. For the supervised verification tracks we trained 6 state-of-the-art ECAPA-TDNN systems and 4 Resnet34 based systems with architectural variations. On all models we apply a large margin fine-tuning strategy, which enables the training procedure to use higher margin penalties by using longer training utterances. In addition, we use quality-aware score calibration which introduces quality metrics in the calibration system to generate more consistent scores across varying levels of utterance conditions. A fusion of all systems with both enhancements applied led to the first place on the open and closed supervised verification tracks. The unsupervised system is trained through contrastive learning. Subsequent pseudo-label generation by iterative clustering of the training embeddings allows the use of supervised techniques. This procedure led to the winning submission on the unsupervised track, and its performance is closing in on supervised training.