端到端的双路径风格学习，端到端的噪声语音识别

论文标题

端到端的双路径风格学习，端到端的噪声语音识别

Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition

论文作者

Hu, Yuchen, Hou, Nana, Chen, Chen, Chng, Eng Siong

论文摘要

自动语音识别（ASR）系统在嘈杂条件下显着降解。最近，引入了语音增强（SE）作为前端，以减少ASR的噪声，但也抑制了一些重要的语音信息，即过度抑制。为了减轻这一点，我们为端到端的噪声识别（DPSL-ASR）提出了双路径样式学习方法。具体来说，我们首先引入干净的语音功能以及IFF-NET作为双路径输入的融合功能，以恢复被抑制的信息。然后，我们建议样式学习以绘制几乎清洁功能的融合功能，以便从后者（即干净的“语音样式”）中学习潜在的语音信息。此外，我们还将最终ASR输出的距离最小化了，以提高噪声的噪声。实验表明，所提出的方法分别在大鼠和chime-4数据集上分别比最佳IFF-NET基线的相对单词错误率（WER）降低了10.6％和8.6％。

Automatic speech recognition (ASR) systems degrade significantly under noisy conditions. Recently, speech enhancement (SE) is introduced as front-end to reduce noise for ASR, but it also suppresses some important speech information, i.e., over-suppression. To alleviate this, we propose a dual-path style learning approach for end-to-end noise-robust speech recognition (DPSL-ASR). Specifically, we first introduce clean speech feature along with the fused feature from IFF-Net as dual-path inputs to recover the suppressed information. Then, we propose style learning to map the fused feature close to clean feature, in order to learn latent speech information from the latter, i.e., clean "speech style". Furthermore, we also minimize the distance of final ASR outputs in two paths to improve noise-robustness. Experiments show that the proposed approach achieves relative word error rate (WER) reductions of 10.6% and 8.6% over the best IFF-Net baseline, on RATS and CHiME-4 datasets respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题