论文标题
扬声器验证后端,可改善各种条件下的校准性能
A Speaker Verification Backend for Improved Calibration Performance across Varying Conditions
论文作者
论文摘要
在最近的一项工作中,我们为演讲者验证提供了歧视性的后端,该后端在大多数经过测试的条件下实现了良好的现成校准性能,其中包含不同水平的训练条件的不匹配水平。此后端模仿了大多数当前扬声器验证系统(包括校准阶段)中使用的标准基于PLDA的后端过程。后端的所有参数均经过联合训练,以优化扬声器验证任务的二进制跨渗透性。校准鲁棒性是通过使校准阶段的参数作为代表信号条件的向量的函数来实现的,该函数是使用经过训练以预测条件标签的模型提取的。在这项工作中,我们提出了此后端的简化版本,其中用来计算校准参数的向量在后端估计,而无需条件预测模型。我们表明,这种简化的方法提供了与先前提出的方法相似的性能,同时更简单地实施,并且对培训数据的要求较少。此外,我们提供了该方法的不同方面的分析,包括初始化的效果,用于计算校准参数的向量的性质以及随机种子和训练时代对性能的影响。我们还将提出的方法与基于试验的校准(TBC)方法进行了比较,据我们所知,该方法是在各种条件下实现良好校准的最新方法。我们表明,所提出的方法的表现优于TBC,同时也比标准PLDA基线相当地运行几个数量级。
In a recent work, we presented a discriminative backend for speaker verification that achieved good out-of-the-box calibration performance on most tested conditions containing varying levels of mismatch to the training conditions. This backend mimics the standard PLDA-based backend process used in most current speaker verification systems, including the calibration stage. All parameters of the backend are jointly trained to optimize the binary cross-entropy for the speaker verification task. Calibration robustness is achieved by making the parameters of the calibration stage a function of vectors representing the conditions of the signal, which are extracted using a model trained to predict condition labels. In this work, we propose a simplified version of this backend where the vectors used to compute the calibration parameters are estimated within the backend, without the need for a condition prediction model. We show that this simplified method provides similar performance to the previously proposed method while being simpler to implement, and having less requirements on the training data. Further, we provide an analysis of different aspects of the method including the effect of initialization, the nature of the vectors used to compute the calibration parameters, and the effect that the random seed and the number of training epochs has on performance. We also compare the proposed method with the trial-based calibration (TBC) method that, to our knowledge, was the state-of-the-art for achieving good calibration across varying conditions. We show that the proposed method outperforms TBC while also being several orders of magnitude faster to run, comparable to the standard PLDA baseline.