电子分支机构：分支机构具有增强合并以进行语音识别

论文标题

电子分支机构：分支机构具有增强合并以进行语音识别

E-Branchformer: Branchformer with Enhanced merging for speech recognition

论文作者

Kim, Kwangyoun, Wu, Felix, Peng, Yifan, Pan, Jing, Sridhar, Prashant, Han, Kyu J., Watanabe, Shinji

论文摘要

结合卷积和自我发挥作用以捕获本地和全球信息的构象异构符，目前被视为自动语音识别（ASR）的最新表现。其他几项研究探索了整合卷积和自我注意力，但它们尚未设法匹配Conformer的表现。最近引入的分支机构通过使用卷积和自我关注的专用分支，并从每个分支合并本地和全球环境，从而实现与构象相可比的性能。在本文中，我们提出了电子分支机构，该E-Branchformer通过应用有效的合并方法并堆叠额外的点模块来增强分支形式。 E-Branchformer设置了新的最先进的单词错误率（WERS）1.81％和3.65％的LibrisPeech测试清洁和测试集，而无需使用任何外部培训数据。

Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR). Several other studies have explored integrating convolution and self-attention but they have not managed to match Conformer's performance. The recently introduced Branchformer achieves comparable performance to Conformer by using dedicated branches of convolution and self-attention and merging local and global context from each branch. In this paper, we propose E-Branchformer, which enhances Branchformer by applying an effective merging method and stacking additional point-wise modules. E-Branchformer sets new state-of-the-art word error rates (WERs) 1.81% and 3.65% on LibriSpeech test-clean and test-other sets without using any external training data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题