论文标题
统一说话者验证的余弦和PLDA后端
Unifying Cosine and PLDA Back-ends for Speaker Verification
论文作者
论文摘要
最先进的说话者验证(SV)系统使用后端模型来评分从神经网络模型中提取的说话者嵌入的相似性。常用的后端模型是余弦评分和概率线性判别分析(PLDA)评分。借助最近开发的神经嵌入,理论上更具吸引力的PLDA方法在SV系统性能方面没有任何优势,甚至没有劣等的余弦评分。本文介绍了对两种评分方法之间关系的调查,旨在解释上述违反直觉观察。结果表明,余弦得分本质上是PLDA得分的特殊情况。换句话说,通过正确设置PLDA的参数,这两个后端变得相等。结果,余弦评分不仅继承了PLDA的基本假设,而且还引入了有关输入嵌入属性的其他假设。实验表明,余弦评分所需的维度独立性假设对域匹配条件下这两种方法之间的性能差距最大。当存在严重的域不匹配并且不确定尺寸的独立性假设时,PLDA的性能将比余弦适应的余弦更好。
State-of-art speaker verification (SV) systems use a back-end model to score the similarity of speaker embeddings extracted from a neural network model. The commonly used back-end models are the cosine scoring and the probabilistic linear discriminant analysis (PLDA) scoring. With the recently developed neural embeddings, the theoretically more appealing PLDA approach is found to have no advantage against or even be inferior the simple cosine scoring in terms of SV system performance. This paper presents an investigation on the relation between the two scoring approaches, aiming to explain the above counter-intuitive observation. It is shown that the cosine scoring is essentially a special case of PLDA scoring. In other words, by properly setting the parameters of PLDA, the two back-ends become equivalent. As a consequence, the cosine scoring not only inherits the basic assumptions for the PLDA but also introduces additional assumptions on the properties of input embeddings. Experiments show that the dimensional independence assumption required by the cosine scoring contributes most to the performance gap between the two methods under the domain-matched condition. When there is severe domain mismatch and the dimensional independence assumption does not hold, the PLDA would perform better than the cosine for domain adaptation.