培训有限的内域监督的自回归语音识别模型

论文标题

培训有限的内域监督的自回归语音识别模型

Training Autoregressive Speech Recognition Models with Limited in-domain Supervision

论文作者

Li, Chak-Fai, Keith, Francis, Hartmann, William, Snover, Matthew

论文摘要

自学学习的进步大大减少了培训所需的转录音频量。但是，该领域的大多数工作都集中在阅读语音上。我们在对话演讲领域探索有限的监督。虽然我们假设内域数据的量有限，但我们使用开源读取语音数据来增强模型。 XLS-R模型已显示出有限的适应性数据表现良好，并用作强大的基线。我们使用未转录的数据进行自我监督学习和在自回归编码器模型中的半监督培训。我们证明，通过将XLS-R模型用于假转录，一个较小的自回归模型可以胜过易经的XLS-R模型，而转录内域数据受到限制，从而使WER降低了多达8％的绝对值。

Advances in self-supervised learning have significantly reduced the amount of transcribed audio required for training. However, the majority of work in this area is focused on read speech. We explore limited supervision in the domain of conversational speech. While we assume the amount of in-domain data is limited, we augment the model with open source read speech data. The XLS-R model has been shown to perform well with limited adaptation data and serves as a strong baseline. We use untranscribed data for self-supervised learning and semi-supervised training in an autoregressive encoder-decoder model. We demonstrate that by using the XLS-R model for pseudotranscription, a much smaller autoregressive model can outperform a finetuned XLS-R model when transcribed in-domain data is limited, reducing WER by as much as 8% absolute.

下载PDF全文

下载文献需遵守相关版权规定

论文标题