保留单个源的基于空间提示的伪双层卵形的光束形成效果

论文标题

保留单个源的基于空间提示的伪双层卵形的光束形成效果

Preserving the beamforming effect for spatial cue-based pseudo-binaural dereverberation of a single source

论文作者

Gul, Sania, Khan, Muhammad Salman, Shah, Syed Waqar

论文摘要

围栏中不可避免地回响，导致听力受损和非本地听众的清晰度降低，甚至在嘈杂的情况下正常听众听众。它还降低了机器听力应用程序的性能。在本文中，我们使用直接路径信号和混响的范围差异提出了一种单个语音源双耳覆盖的新方法。两个波束形式在间隔距离处，用于从回响语音中提取混响。这些回响产生的耳间提示和直接路径信号产生的提示是两个类数据集，用于训练U-NET（深度卷积神经网络）。训练后，将删除光束形成器，并使用训练有素的U-NET以及最大似然估计（MLE）算法，用于区分直接路径线索与回响线索的直接路径线索，当该系统暴露于Reverblant语音信号的室内光谱图。我们提出的模型超出了经典信号处理模型，在经体距离（CEP），频率加权节段信号与噪声比（FWSEGSNR）方面的加权预测误差（CEP）和回响调制能量比（SRMR）的差异为1.4点，8 db和0.6dB。它通过使用可比的FWSEGSNR获得了1.3分的CEP，使用训练数据集比该模型所需的近8倍，从而获得了基于深度学习的替代模型的性能。提出的模型还在相对相似的看不见的声学条件和其训练位置附近的位置保持了其性能。

Reverberations are unavoidable in enclosures, resulting in reduced intelligibility for hearing impaired and non native listeners and even for the normal hearing listeners in noisy circumstances. It also degrades the performance of machine listening applications. In this paper, we propose a novel approach of binaural dereverberation of a single speech source, using the differences in the interaural cues of the direct path signal and the reverberations. Two beamformers, spaced at an interaural distance, are used to extract the reverberations from the reverberant speech. The interaural cues generated by these reverberations and those generated by the direct path signal act as a two class dataset, used for the training of U-Net (a deep convolutional neural network). After its training, the beamformers are removed and the trained U-Net along with the maximum likelihood estimation (MLE) algorithm is used to discriminate between the direct path cues from the reverberation cues, when the system is exposed to the interaural spectrogram of the reverberant speech signal. Our proposed model has outperformed the classical signal processing dereverberation model weighted prediction error in terms of cepstral distance (CEP), frequency weighted segmental signal to noise ratio (FWSEGSNR) and signal to reverberation modulation energy ratio (SRMR) by 1.4 points, 8 dB and 0.6dB. It has achieved better performance than the deep learning based dereverberation model by gaining 1.3 points improvement in CEP with comparable FWSEGSNR, using training dataset which is almost 8 times smaller than required for that model. The proposed model also sustained its performance under relatively similar unseen acoustic conditions and at positions in the vicinity of its training position.

下载PDF全文

下载文献需遵守相关版权规定

论文标题