局部问题：自动语音识别的局部偏向线性关注

论文标题

局部问题：自动语音识别的局部偏向线性关注

Locality Matters: A Locality-Biased Linear Attention for Automatic Speech Recognition

论文作者

Sun, Jingyu, Zhong, Guiping, Zhou, Dinghao, Li, Baoxiang, Zhong, Yiran

论文摘要

在许多公共基准上，Conformer在自动语音识别（ASR）方面取得了巨大成功。它的关键缺点之一是相对于输入序列长度的二次时空复杂性，这禁止模型扩展以及过程更长的输入音频序列。为了解决这个问题，已经提出了许多线性注意方法。但是，这些方法在ASR上通常具有有限的性能，因为它们在建模中平均处理令牌，忽略了相邻令牌通常比距离令牌更具连接性的事实。在本文中，我们考虑了这一事实，并提出了一个新的局部偏向构象的线性关注。它不仅比香草构象异构体获得了更高的精度，而且还具有线性时空计算复杂性。要具体而言，我们用构象异构块中的局部偏见线性注意（LBLA）机制代替了SoftMax的注意。 LBLA包含一个内核函数，以确保线性复杂性和余弦重新呼叫矩阵，以对相邻令牌施加更多权重。在Librispeech语料库上进行的广泛实验表明，通过将此位置偏见引入构象异构体，我们的方法以超过22％的推理速度实现了较低的单词错误率。

Conformer has shown a great success in automatic speech recognition (ASR) on many public benchmarks. One of its crucial drawbacks is the quadratic time-space complexity with respect to the input sequence length, which prohibits the model to scale-up as well as process longer input audio sequences. To solve this issue, numerous linear attention methods have been proposed. However, these methods often have limited performance on ASR as they treat tokens equally in modeling, neglecting the fact that the neighbouring tokens are often more connected than the distanced tokens. In this paper, we take this fact into account and propose a new locality-biased linear attention for Conformer. It not only achieves higher accuracy than the vanilla Conformer, but also enjoys linear space-time computational complexity. To be specific, we replace the softmax attention with a locality-biased linear attention (LBLA) mechanism in Conformer blocks. The LBLA contains a kernel function to ensure the linear complexities and a cosine reweighing matrix to impose more weights on neighbouring tokens. Extensive experiments on the LibriSpeech corpus show that by introducing this locality bias to the Conformer, our method achieves a lower word error rate with more than 22% inference speed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题