远处ASR的空间处理前端，利用自我发项频道组合器

论文标题

远处ASR的空间处理前端，利用自我发项频道组合器

Spatial Processing Front-End For Distant ASR Exploiting Self-Attention Channel Combinator

论文作者

Sharma, Dushyant, Gong, Rong, Fosburgh, James, Kruchinin, Stanislav Yu., Naylor, Patrick A., Milanovic, Ljubomir

论文摘要

我们提出了一种新型的多通道前端，基于通道缩短，并使用量子缩短（WPE）方法，然后是用于解决遥远ASR问题的固定MVDR光束器（SACC）方案（SACC）方案。我们表明，所提出的系统用作基于上下文网络的端到端（E2E）ASR系统的一部分优于领先的ASR系统，而在多渠道Liblispeech播放数据集中，相对WER降低了21.6％。我们还展示了在光束形成之前的静电脊柱是有益的，并将WPE方法与修饰的神经通道缩短方法进行比较。对信号C50的非侵入性估计值的分析证实，8通道WPE方法提供了信号的显着覆盖（改善13.6 dB）。我们还展示了SACC系统的权重如何允许提取精确的空间信息，这可能对其他语音处理应用（例如诊断）有益。

We present a novel multi-channel front-end based on channel shortening with theWeighted Prediction Error (WPE) method followed by a fixed MVDR beamformer used in combination with a recently proposed self-attention-based channel combination (SACC) scheme, for tackling the distant ASR problem. We show that the proposed system used as part of a ContextNet based end-to-end (E2E) ASR system outperforms leading ASR systems as demonstrated by a 21.6% reduction in relative WER on a multi-channel LibriSpeech playback dataset. We also show how dereverberation prior to beamforming is beneficial and compare the WPE method with a modified neural channel shortening approach. An analysis of the non-intrusive estimate of the signal C50 confirms that the 8 channel WPE method provides significant dereverberation of the signals (13.6 dB improvement). We also show how the weights of the SACC system allow the extraction of accurate spatial information which can be beneficial for other speech processing applications like diarization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题