论文标题
使用基于流量的密度估计的多通道ASR的强大前端
Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation
论文作者
论文摘要
对于多渠道语音识别,通常将语音增强技术(例如去核或舍验)作为前端处理器应用。使用此类技术的基于深度学习的前端需要对齐的清洁和嘈杂的语音对,这些语音对通常是通过数据模拟获得的。最近,已经提出了几种联合优化技术在端到端自动语音识别(ASR)方案中没有平行数据的前端训练前端。但是,ASR目标是最佳的,不足以全面训练前端,这仍然留出了改进的空间。在本文中,我们提出了一种新颖的方法,该方法结合了使用非平行清洁和嘈杂的语音对稳健前端进行基于流动的密度估计。 Chime-4数据集的实验结果表明,所提出的方法的表现优于仅使用ASR目标训练前端的常规技术。
For multi-channel speech recognition, speech enhancement techniques such as denoising or dereverberation are conventionally applied as a front-end processor. Deep learning-based front-ends using such techniques require aligned clean and noisy speech pairs which are generally obtained via data simulation. Recently, several joint optimization techniques have been proposed to train the front-end without parallel data within an end-to-end automatic speech recognition (ASR) scheme. However, the ASR objective is sub-optimal and insufficient for fully training the front-end, which still leaves room for improvement. In this paper, we propose a novel approach which incorporates flow-based density estimation for the robust front-end using non-parallel clean and noisy speech. Experimental results on the CHiME-4 dataset show that the proposed method outperforms the conventional techniques where the front-end is trained only with ASR objective.