TSUP扬声器诊断系统用于对话短语扬声器诊断挑战

论文标题

TSUP扬声器诊断系统用于对话短语扬声器诊断挑战

TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge

论文作者

Pang, Bowen, Zhao, Huan, Zhang, Gaosheng, Yang, Xiaoyue, Sun, Yang, Zhang, Li, Wang, Qing, Xie, Lei

论文摘要

本文描述了TSUP团队对ISCSLP 2022对话短语扬声器诊断（CSSD）挑战的提交，该挑战尤其着重于短语对话，并具有新的评估度量标准，称为对话性诊断误差率（CDER）。在这一挑战中，我们探讨了三种典型的扬声器诊断系统，它们分别是基于光谱聚类（SC）的诊断，宣传言论的语音活动检测（TS-VAD）和端到端神经诊断（EEND）。我们的主要发现总结如下。首先，与新的CDER指标下的其他两种方法相比，SC方法更受欢迎。其次，对超参数进行调整对于对所有三种扬声器诊断系统的CDR至关重要。具体而言，当子段设置的长度更长时，CDE会变小。最后，通过Dover-Lap的多系统融合将使挑战数据的CDER度量恶化。我们提交的SC系统最终排名挑战中的第三名。

This paper describes the TSUP team's submission to the ISCSLP 2022 conversational short-phrase speaker diarization (CSSD) challenge which particularly focuses on short-phrase conversations with a new evaluation metric called conversational diarization error rate (CDER). In this challenge, we explore three kinds of typical speaker diarization systems, which are spectral clustering(SC) based diarization, target-speaker voice activity detection(TS-VAD) and end-to-end neural diarization(EEND) respectively. Our major findings are summarized as follows. First, the SC approach is more favored over the other two approaches under the new CDER metric. Second, tuning on hyperparameters is essential to CDER for all three types of speaker diarization systems. Specifically, CDER becomes smaller when the length of sub-segments setting longer. Finally, multi-system fusion through DOVER-LAP will worsen the CDER metric on the challenge data. Our submitted SC system eventually ranks the third place in the challenge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题