音乐源分离的多通道U-NET

论文标题

音乐源分离的多通道U-NET

Multi-channel U-Net for Music Source Separation

论文作者

Kadandale, Venkatesh S., Montesinos, Juan F., Haro, Gloria, Gómez, Emilia

论文摘要

音乐源分离的一种相当简单的方法是训练独立的模型，其中每个模型都专门用于估计特定源。培训单个模型以估算多个来源通常不如独立专用模型。但是，条件的U-NET（C-U-NET）使用控制机制来训练单个模型进行多源分离，并尝试实现与专用模型相当的性能。我们提出了一个多通道U-NET（M-U-NET），该多通道使用加权多任务损耗训练，以替代C-U-NET。我们研究了多任务损失的两种加权策略：1）动态加权平均值（DWA）和2）基于能量的加权（EBW）。 DWA通过跟踪训练过程中每个任务的变化损失的变化率来确定权重。 EBW的目的是消除由混合物中每个来源的能量水平差异产生的训练偏差的影响。与C-UNET相比，我们的方法提供了三倍的优势：1）每个时期的有效训练迭代较少，2）较少的可训练网络参数（无控制参数）和3）推断时更快地处理。我们的方法达到的性能与C-U-NET和专用的U-NET相当，培训成本要低得多。

A fairly straightforward approach for music source separation is to train independent models, wherein each model is dedicated for estimating only a specific source. Training a single model to estimate multiple sources generally does not perform as well as the independent dedicated models. However, Conditioned U-Net (C-U-Net) uses a control mechanism to train a single model for multi-source separation and attempts to achieve a performance comparable to that of the dedicated models. We propose a multi-channel U-Net (M-U-Net) trained using a weighted multi-task loss as an alternative to the C-U-Net. We investigate two weighting strategies for our multi-task loss: 1) Dynamic Weighted Average (DWA), and 2) Energy Based Weighting (EBW). DWA determines the weights by tracking the rate of change of loss of each task during training. EBW aims to neutralize the effect of the training bias arising from the difference in energy levels of each of the sources in a mixture. Our methods provide three-fold advantages compared to C-UNet: 1) Fewer effective training iterations per epoch, 2) Fewer trainable network parameters (no control parameters), and 3) Faster processing at inference. Our methods achieve performance comparable to that of C-U-Net and the dedicated U-Nets at a much lower training cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题