通过生成建模增强音乐分离

论文标题

通过生成建模增强音乐分离

Music Separation Enhancement with Generative Modeling

论文作者

Schaffer, Noah, Cogan, Boaz, Manilow, Ethan, Morrison, Max, Seetharaman, Prem, Pardo, Bryan

论文摘要

尽管近年来取得了惊人的进步，但最先进的音乐分离系统会产生具有显着知觉缺点的来源估计，例如增加无关噪声或消除谐波。我们提出了一个后处理模型（Make Make Good Good（MSG）后处理器），以增强音乐源分离系统的输出。我们将我们的后处理模型应用于最新的基于波形和基于频谱图的音乐源分离器，包括在训练过程中未见的分离器。我们对源分离器产生的误差的分析表明，波形模型倾向于引入更多高频噪声，而频谱图模型倾向于丢失瞬变和高频含量。我们引入了客观措施来量化这两种错误并显示味精改善了两种错误的源重建。众包主观评估表明，人类听众更喜欢质量质量质量进行后处理的低音和鼓的来源估计。

Despite phenomenal progress in recent years, state-of-the-art music separation systems produce source estimates with significant perceptual shortcomings, such as adding extraneous noise or removing harmonics. We propose a post-processing model (the Make it Sound Good (MSG) post-processor) to enhance the output of music source separation systems. We apply our post-processing model to state-of-the-art waveform-based and spectrogram-based music source separators, including a separator unseen by MSG during training. Our analysis of the errors produced by source separators shows that waveform models tend to introduce more high-frequency noise, while spectrogram models tend to lose transients and high frequency content. We introduce objective measures to quantify both kinds of errors and show MSG improves the source reconstruction of both kinds of errors. Crowdsourced subjective evaluations demonstrate that human listeners prefer source estimates of bass and drums that have been post-processed by MSG.

下载PDF全文

下载文献需遵守相关版权规定

论文标题