使用类相似性来适应稳健语音识别的域适应性

论文标题

使用类相似性来适应稳健语音识别的域适应性

Domain Adaptation Using Class Similarity for Robust Speech Recognition

论文作者

Zhu, Han, Zhao, Jiangjiang, Ren, Yuling, Wang, Li, Zhang, Pengyuan

论文摘要

如果只有有限的目标域数据，则可以使用域的适应性来促进深神经网络（DNN）声学模型的性能，来利用训练有素的源模型和目标域数据。但是，遭受领域不匹配和数据稀疏性的困扰，域的适应性非常具有挑战性。本文提出了一种使用类相似性的DNN声学模型的新型适应方法。由于DNN模型的输出分布包含了类之间相似性的知识，该类别适用于源和目标域，因此可以将其从源转移到目标模型以进行改进。在我们的方法中，我们首先使用源模型计算源样本的帧级后验概率。然后，对于每个类，该类的概率用于计算平均向量，我们称之为平均软标签。在适应过程中，这些平均软标签在正规化项中用于训练目标模型。实验表明，我们的方法在重音和噪声适应任务上使用单速标签优于微调，尤其是当源和目标域高度不匹配时。

When only limited target domain data is available, domain adaptation could be used to promote performance of deep neural network (DNN) acoustic model by leveraging well-trained source model and target domain data. However, suffering from domain mismatch and data sparsity, domain adaptation is very challenging. This paper proposes a novel adaptation method for DNN acoustic model using class similarity. Since the output distribution of DNN model contains the knowledge of similarity among classes, which is applicable to both source and target domain, it could be transferred from source to target model for the performance improvement. In our approach, we first compute the frame level posterior probabilities of source samples using source model. Then, for each class, probabilities of this class are used to compute a mean vector, which we refer to as mean soft labels. During adaptation, these mean soft labels are used in a regularization term to train the target model. Experiments showed that our approach outperforms fine-tuning using one-hot labels on both accent and noise adaptation task, especially when source and target domain are highly mismatched.

下载PDF全文

下载文献需遵守相关版权规定

论文标题