以序列到序列模型的监督注意力进行语音识别

论文标题

以序列到序列模型的监督注意力进行语音识别

Supervised Attention in Sequence-to-Sequence Models for Speech Recognition

论文作者

Yang, Gene-Ping, Tang, Hao

论文摘要

序列到序列模型中的注意机制旨在建模语音识别中声学特征和输出令牌之间的比对。但是，训练有素的端到头模型产生的注意力重量并不总是与实际对齐方式相对应，并且一些研究进一步指出，注意力的权重甚至可能与框架的相关归因不太吻合。无论如何，在训练期间，注意力重量和对齐之间的视觉相似性被广泛用作模型质量的指标。在本文中，我们通过施加监督的注意力损失来将注意力重量和对齐之间的对应关系视为学习问题。实验显示出显着改善的性能，表明在训练过程中学习对齐方式很好地确定了序列到序列模型的性能。

Attention mechanism in sequence-to-sequence models is designed to model the alignments between acoustic features and output tokens in speech recognition. However, attention weights produced by models trained end to end do not always correspond well with actual alignments, and several studies have further argued that attention weights might not even correspond well with the relevance attribution of frames. Regardless, visual similarity between attention weights and alignments is widely used during training as an indicator of the models quality. In this paper, we treat the correspondence between attention weights and alignments as a learning problem by imposing a supervised attention loss. Experiments have shown significant improved performance, suggesting that learning the alignments well during training critically determines the performance of sequence-to-sequence models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题