时间域音频源源基于Wave-U-NET与离散小波变换的结合

论文标题

时间域音频源源基于Wave-U-NET与离散小波变换的结合

Time-Domain Audio Source Separation Based on Wave-U-Net Combined with Discrete Wavelet Transform

论文作者

Nakamura, Tomohiko, Saruwatari, Hiroshi

论文摘要

我们使用基于离散小波变换（DWT）的下采样（DS）和上采样（US）层提出了时间域音频源分离方法。提出的方法基于最先进的深神经网络之一Wave-U-NET，该网络连续地下示例和上样本具有地图。我们发现，这种体系结构类似于多分辨率分析的架构，并揭示了Wave-U-NET的DS层会引起混叠，并可能丢弃对分离有用的信息。尽管这些问题的影响可能会通过培训来减少，以实现一种更可靠的源分离方法，但我们应该设计能够克服问题的DS层。凭借这种信念，我们专注于DWT具有抗氧化过滤器和完美的重建属性的事实，我们设计了所提出的层。音乐源分离的实验表明了所提出的方法的功效以及同时考虑抗缩减过滤器和完美重建属性的重要性。

We propose a time-domain audio source separation method using down-sampling (DS) and up-sampling (US) layers based on a discrete wavelet transform (DWT). The proposed method is based on one of the state-of-the-art deep neural networks, Wave-U-Net, which successively down-samples and up-samples feature maps. We find that this architecture resembles that of multiresolution analysis, and reveal that the DS layers of Wave-U-Net cause aliasing and may discard information useful for the separation. Although the effects of these problems may be reduced by training, to achieve a more reliable source separation method, we should design DS layers capable of overcoming the problems. With this belief, focusing on the fact that the DWT has an anti-aliasing filter and the perfect reconstruction property, we design the proposed layers. Experiments on music source separation show the efficacy of the proposed method and the importance of simultaneously considering the anti-aliasing filters and the perfect reconstruction property.

下载PDF全文

下载文献需遵守相关版权规定

论文标题