论文标题
无监督的可解释的表示语音分离的学习
Unsupervised Interpretable Representation Learning for Singing Voice Separation
论文作者
论文摘要
在这项工作中,我们提出了一种直接从波形信号学习可解释的音乐信号表示的方法。可以使用无监督的目标对我们的方法进行训练,并依赖于使用简单的正弦模型作为解码功能来重建唱歌声音的DeNoising自动编码器模型。为了证明我们的方法的好处,我们采用了获得的表示形式,即通过二进制掩盖来实现知情声音分离的任务,并通过规模不变的信号与失真比测量所获得的分离质量。我们的发现表明,我们的方法能够学习有意义的表示语音分离,同时保留短时傅立叶变换的便利性,例如非负性,平稳性和重建,但要在音频和音乐源分离中以时间频率掩盖为主。
In this work, we present a method for learning interpretable music signal representations directly from waveform signals. Our method can be trained using unsupervised objectives and relies on the denoising auto-encoder model that uses a simple sinusoidal model as decoding functions to reconstruct the singing voice. To demonstrate the benefits of our method, we employ the obtained representations to the task of informed singing voice separation via binary masking, and measure the obtained separation quality by means of scale-invariant signal to distortion ratio. Our findings suggest that our method is capable of learning meaningful representations for singing voice separation, while preserving conveniences of the the short-time Fourier transform like non-negativity, smoothness, and reconstruction subject to time-frequency masking, that are desired in audio and music source separation.