可靠自动语音识别的基于构象异构体的声学模型

论文标题

可靠自动语音识别的基于构象异构体的声学模型

A Conformer Based Acoustic Model for Robust Automatic Speech Recognition

论文作者

Yang, Yufeng, Wang, Peidong, Wang, DeLiang

论文摘要

这项研究通过引入基于构象体的声学模型来解决强大的自动语音识别（ASR）。所提出的模型建立在宽阔的剩余双向长期记忆网络（WRBN）上，具有倾斜的辍学和迭代扬声器的适应性，但采用了一个顺式编码器而不是经常性网络。构象异构体编码使用卷积增强的注意机制进行声学建模。对Chime-4语料库的单膜ASR任务进行了评估。再加上说话的标准化和扬声器的适应性，我们的模型可实现$ 6.25 \％$ WOWS错误率，相对较高的是$ 8.4 \％$。此外，拟议的基于构象异构体的型号为$ 18.3 \％$ $较小的型号，并将总培训时间减少$ 79.6 \％$。

This study addresses robust automatic speech recognition (ASR) by introducing a Conformer-based acoustic model. The proposed model builds on the wide residual bi-directional long short-term memory network (WRBN) with utterance-wise dropout and iterative speaker adaptation, but employs a Conformer encoder instead of the recurrent network. The Conformer encoder uses a convolution-augmented attention mechanism for acoustic modeling. The proposed system is evaluated on the monaural ASR task of the CHiME-4 corpus. Coupled with utterance-wise normalization and speaker adaptation, our model achieves $6.25\%$ word error rate, which outperforms WRBN by $8.4\%$ relatively. In addition, the proposed Conformer-based model is $18.3\%$ smaller in model size and reduces total training time by $79.6\%$.

下载PDF全文

下载文献需遵守相关版权规定

论文标题