SECO：通过一致性指导将未知的音乐视觉声音分开

论文标题

SECO：通过一致性指导将未知的音乐视觉声音分开

SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance

论文作者

Zhou, Xinchi, Zhou, Dongzhan, Ouyang, Wanli, Zhou, Hang, Liu, Ziwei, Hu, Di

论文摘要

近年来，深入学习对视觉声音分离任务的成功取得了成功。但是，现有作品遵循类似的设置，在该设置中，培训和测试数据集共享相同的乐器类别，在某种程度上，这限制了此任务的多功能性。在这项工作中，我们专注于一个更普遍和具有挑战性的场景，即未知乐器的分离，在训练和测试阶段中的类别彼此之间没有重叠。为了解决这个新环境，我们提出了与矛盾（SECO）框架的分离框架，该框架可以通过利用一致性约束来实现未知类别的分离。此外，为了捕获新型旋律的更丰富的特征，我们制定了一种在线匹配策略，可以带来稳定的增强功能，而无需额外的参数成本。实验表明，我们的SECO框架在新型音乐类别上具有强大的适应能力，并以明显的边距优于基线方法。

Recent years have witnessed the success of deep learning on the visual sound separation task. However, existing works follow similar settings where the training and testing datasets share the same musical instrument categories, which to some extent limits the versatility of this task. In this work, we focus on a more general and challenging scenario, namely the separation of unknown musical instruments, where the categories in training and testing phases have no overlap with each other. To tackle this new setting, we propose the Separation-with-Consistency (SeCo) framework, which can accomplish the separation on unknown categories by exploiting the consistency constraints. Furthermore, to capture richer characteristics of the novel melodies, we devise an online matching strategy, which can bring stable enhancements with no cost of extra parameters. Experiments demonstrate that our SeCo framework exhibits strong adaptation ability on the novel musical categories and outperforms the baseline methods by a significant margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题