论文标题
电子分支机构:分支机构具有增强合并以进行语音识别
E-Branchformer: Branchformer with Enhanced merging for speech recognition
论文作者
论文摘要
结合卷积和自我发挥作用以捕获本地和全球信息的构象异构符,目前被视为自动语音识别(ASR)的最新表现。其他几项研究探索了整合卷积和自我注意力,但它们尚未设法匹配Conformer的表现。最近引入的分支机构通过使用卷积和自我关注的专用分支,并从每个分支合并本地和全球环境,从而实现与构象相可比的性能。在本文中,我们提出了电子分支机构,该E-Branchformer通过应用有效的合并方法并堆叠额外的点模块来增强分支形式。 E-Branchformer设置了新的最先进的单词错误率(WERS)1.81%和3.65%的LibrisPeech测试清洁和测试集,而无需使用任何外部培训数据。
Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR). Several other studies have explored integrating convolution and self-attention but they have not managed to match Conformer's performance. The recently introduced Branchformer achieves comparable performance to Conformer by using dedicated branches of convolution and self-attention and merging local and global context from each branch. In this paper, we propose E-Branchformer, which enhances Branchformer by applying an effective merging method and stacking additional point-wise modules. E-Branchformer sets new state-of-the-art word error rates (WERs) 1.81% and 3.65% on LibriSpeech test-clean and test-other sets without using any external training data.