论文标题
单渠道和多渠道端到端神经腹泻的相互学习
Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization
论文作者
论文摘要
由于多通道语音处理的高性能,我们可以在训练具有知识蒸馏的单渠道模型时使用多渠道模型的输出作为教师标签。相反,也已经知道,单渠道语音数据可以通过在培训期间将其与多通道语音数据混合或将其用于模型预处理,从而使多渠道模型受益。本文着重于说话者诊断,并建议交替进行上述双向知识转移。我们首先引入了一个可以处理单通道输入的端到端神经诊断模型。使用此模型,我们交替地进行I)知识蒸馏从多通道模型到单渠道模型,以及II)从蒸馏的单通道模型到多通道模型的列表。两扬声器数据的实验结果表明,所提出的方法相互改善了单通道和多通道扬声器诊断性能。
Due to the high performance of multi-channel speech processing, we can use the outputs from a multi-channel model as teacher labels when training a single-channel model with knowledge distillation. To the contrary, it is also known that single-channel speech data can benefit multi-channel models by mixing it with multi-channel speech data during training or by using it for model pretraining. This paper focuses on speaker diarization and proposes to conduct the above bi-directional knowledge transfer alternately. We first introduce an end-to-end neural diarization model that can handle both single- and multi-channel inputs. Using this model, we alternately conduct i) knowledge distillation from a multi-channel model to a single-channel model and ii) finetuning from the distilled single-channel model to a multi-channel model. Experimental results on two-speaker data show that the proposed method mutually improved single- and multi-channel speaker diarization performances.