在线端到端语音识别的数据技术

论文标题

在线端到端语音识别的数据技术

Data Techniques For Online End-to-end Speech Recognition

论文作者

Chen, Yang, Wang, Weiran, Chen, I-Fan, Wang, Chao

论文摘要

考虑到有限的域数据，从业人员通常需要在短时间内为新用例构建ASR系统。尽管最近开发的端到端方法在很大程度上简化了建模管道，但它们仍然遭受数据稀疏问题的困扰。在这项工作中，我们探索了一些以端到端方式构建在线ASR系统的简单实施技术，并在目标域中使用少量转录数据。这些技术包括目标域中的数据增强，使用先前在大型源域上训练的模型适应域的适应性以及对非转录目标域数据的知识蒸馏，并使用适应性的双向模型作为教师；它们适用于具有不同类型资源的实际场景。我们的实验表明，每种技术都可以独立于改善目标域中的在线ASR性能。

Practitioners often need to build ASR systems for new use cases in a short amount of time, given limited in-domain data. While recently developed end-to-end methods largely simplify the modeling pipelines, they still suffer from the data sparsity issue. In this work, we explore a few simple-to-implement techniques for building online ASR systems in an end-to-end fashion, with a small amount of transcribed data in the target domain. These techniques include data augmentation in the target domain, domain adaptation using models previously trained on a large source domain, and knowledge distillation on non-transcribed target domain data, using an adapted bi-directional model as the teacher; they are applicable in real scenarios with different types of resources. Our experiments demonstrate that each technique is independently useful in the improvement of the online ASR performance in the target domain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题