论文标题

扬声器诊断的深度自我监督分层聚类

Deep Self-Supervised Hierarchical Clustering for Speaker Diarization

论文作者

Singh, Prachi, Ganapathy, Sriram

论文摘要

最先进的扬声器诊断系统使用集聚层次聚类(AHC),可执行先前学习的神经嵌入的聚类。尽管聚类方法试图识别说话者簇,但AHC算法并不涉及任何进一步的学习。在本文中,我们提出了一种用于层次聚类的新型算法,该算法结合了扬声器聚类以及代表学习框架。所提出的方法基于自我监督学习的原则,在该学习中,自我审视是从聚类算法得出的。表示网络在当前步骤中使用聚类解决方案通过正规三重损失进行了培训,而聚类算法则使用表示步骤中的深层嵌入。通过将基于自学的表述学习与聚类算法相结合,我们表明,在Callhome数据集上,AHC算法的提议算法与AHC算法具有与余弦的相似性相似的AHC算法的相似性。此外,提出的方法还通过PLDA亲和力矩阵对最先进的系统进行了改进,而DER相对相对提高了10%。

The state-of-the-art speaker diarization systems use agglomerative hierarchical clustering (AHC) which performs the clustering of previously learned neural embeddings. While the clustering approach attempts to identify speaker clusters, the AHC algorithm does not involve any further learning. In this paper, we propose a novel algorithm for hierarchical clustering which combines the speaker clustering along with a representation learning framework. The proposed approach is based on principles of self-supervised learning where the self-supervision is derived from the clustering algorithm. The representation learning network is trained with a regularized triplet loss using the clustering solution at the current step while the clustering algorithm uses the deep embeddings from the representation learning step. By combining the self-supervision based representation learning along with the clustering algorithm, we show that the proposed algorithm improves significantly 29% relative improvement) over the AHC algorithm with cosine similarity for a speaker diarization task on CALLHOME dataset. In addition, the proposed approach also improves over the state-of-the-art system with PLDA affinity matrix with 10% relative improvement in DER.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源