扬声器型PLDA的混合物用于儿童语音诊断

论文标题

扬声器型PLDA的混合物用于儿童语音诊断

Mixture of Speaker-type PLDAs for Children's Speech Diarization

论文作者

Xie, Jiamin, Sia, Suzanna, Garcia, Paola, Povey, Daniel, Khudanpur, Sanjeev

论文摘要

在诊断中，PLDA通常用于建模推理结构，该推理结构假设语音段的变化是由各种说话者诱导的。然后从培训数据中学到说话者的变化。但是，人类的看法可以按年龄，性别以及其他特征来区分说话者。在本文中，我们调查了一种说话者型知情模型，该模型明确捕获了扬声器的已知变化。我们探索了三种PLDA模型的混合物，其中每个模型代表成年女性，男性或子类别。每个模型的加权取决于我们研究的各个类别的先验概率。评估是在BabyTrain语料库的一部分上进行的。我们使用Oracle扬声器类型标签检查了预期的性能增长，该标签可降低11.7％。我们介绍了一种新型的婴儿发声增强技术，然后将混合模型与单个模型进行比较。我们的实验结果表明，通过添加发声获得了有效的0.9％DER降低。我们从经验上发现，平衡数据集对于训练混合物PLDA模型很重要，该模型使用相同的训练数据优于单个PLDA，并实现35.8％的DER。同一设置在标准基线上提高了2.8％DER。

In diarization, the PLDA is typically used to model an inference structure which assumes the variation in speech segments be induced by various speakers. The speaker variation is then learned from the training data. However, human perception can differentiate speakers by age, gender, among other characteristics. In this paper, we investigate a speaker-type informed model that explicitly captures the known variation of speakers. We explore a mixture of three PLDA models, where each model represents an adult female, male, or child category. The weighting of each model is decided by the prior probability of its respective class, which we study. The evaluation is performed on a subset of the BabyTrain corpus. We examine the expected performance gain using the oracle speaker type labels, which yields an 11.7% DER reduction. We introduce a novel baby vocalization augmentation technique and then compare the mixture model to the single model. Our experimental result shows an effective 0.9% DER reduction obtained by adding vocalizations. We discover empirically that a balanced dataset is important to train the mixture PLDA model, which outperforms the single PLDA by 1.3% using the same training data and achieving a 35.8% DER. The same setup improves over a standard baseline by 2.8% DER.

下载PDF全文

下载文献需遵守相关版权规定

论文标题