Mixcycle：通过循环混合物置换不变训练的无监督语音分离

论文标题

Mixcycle：通过循环混合物置换不变训练的无监督语音分离

MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant Training

论文作者

Karamatlı, Ertuğ, Kırbız, Serap

论文摘要

我们介绍了两种无监督的源分离方法，其中涉及单渠道两源语音混合物的自我监督训练。我们的第一种方法，即混合置换式训练（MixPit），使学习神经网络模型通过挑战性的代理任务将基础源分开，而无需参考来源监督。我们的第二种方法，即循环混合物置换不变训练（Mixcycle），以循环方式将Mixpit用作构建块进行连续学习。混合物逐渐将问题从将混合物的混合物分离为分离单个混合物。我们将我们的方法与常见监督和无监督的基线进行比较：动态混合（PIT-DM）和混合物不变式训练（MIXIT）的置换不变训练。我们表明，混合周期优于混合体，并达到非常接近监督基线（PIT-DM）的性能水平，同时绕过了混合的过度分离问题。另外，我们提出了一种灵感来自Mixcycle启发的自我评估技术，该技术在不利用任何参考源的情况下估计模型性能。我们表明，它产生的结果与参考源的评估（Librimix）以及对现实生活中的混合物数据集（REAL-M）进行的非正式听力测试一致。

We introduce two unsupervised source separation methods, which involve self-supervised training from single-channel two-source speech mixtures. Our first method, mixture permutation invariant training (MixPIT), enables learning a neural network model which separates the underlying sources via a challenging proxy task without supervision from the reference sources. Our second method, cyclic mixture permutation invariant training (MixCycle), uses MixPIT as a building block in a cyclic fashion for continuous learning. MixCycle gradually converts the problem from separating mixtures of mixtures into separating single mixtures. We compare our methods to common supervised and unsupervised baselines: permutation invariant training with dynamic mixing (PIT-DM) and mixture invariant training (MixIT). We show that MixCycle outperforms MixIT and reaches a performance level very close to the supervised baseline (PIT-DM) while circumventing the over-separation issue of MixIT. Also, we propose a self-evaluation technique inspired by MixCycle that estimates model performance without utilizing any reference sources. We show that it yields results consistent with an evaluation on reference sources (LibriMix) and also with an informal listening test conducted on a real-life mixtures dataset (REAL-M).

下载PDF全文

下载文献需遵守相关版权规定

论文标题