基于广义主成分的瞬时PSD估算语音增强的估计

论文标题

基于广义主成分的瞬时PSD估算语音增强的估计

Instantaneous PSD Estimation for Speech Enhancement based on Generalized Principal Components

论文作者

Dietzen, Thomas, Moonen, Marc, van Waterschoot, Toon

论文摘要

各种麦克风信号成分的功率光谱密度（PSD）估计对于许多语音增强程序至关重要。由于语音是高度非国家的，因此可以通过在PSD估计中维持时间变化来提高绩效。在本文中，我们提出了一种基于广义主成分的瞬时PSD估计方法。与其他基于特征空间的PSD估计方法相似，我们依靠递归平均来获得麦克风信号相关矩阵估计值进行分解。但是，我们建议从新定义的瞬时广义特征值中估算PSD估计的时间平滑的PSD估计，而不是直接从该矩阵的时间平滑普遍的特征值估算PSD，从而得出瞬时PSD估计值。瞬时广义特征值是由广义主成分（即基于广义特征向量的麦克风信号转换）定义的。我们进一步表明，可以将平滑的广义特征值理解为瞬时广义特征值的递归平均值。模拟结果将多通道Wiener滤波器（MWF）与平滑而瞬时的PSD估计进行比较，表明后者的语音增强性能更好。 MATLAB实施可在线提供。

Power spectral density (PSD) estimates of various microphone signal components are essential to many speech enhancement procedures. As speech is highly non-nonstationary, performance improvements may be gained by maintaining time-variations in PSD estimates. In this paper, we propose an instantaneous PSD estimation approach based on generalized principal components. Similarly to other eigenspace-based PSD estimation approaches, we rely on recursive averaging in order to obtain a microphone signal correlation matrix estimate to be decomposed. However, instead of estimating the PSDs directly from the temporally smooth generalized eigenvalues of this matrix, yielding temporally smooth PSD estimates, we propose to estimate the PSDs from newly defined instantaneous generalized eigenvalues, yielding instantaneous PSD estimates. The instantaneous generalized eigenvalues are defined from the generalized principal components, i.e. a generalized eigenvector-based transform of the microphone signals. We further show that the smooth generalized eigenvalues can be understood as a recursive average of the instantaneous generalized eigenvalues. Simulation results comparing the multi-channel Wiener filter (MWF) with smooth and instantaneous PSD estimates indicate better speech enhancement performance for the latter. A MATLAB implementation is available online.

下载PDF全文

下载文献需遵守相关版权规定

论文标题