Dover-lap：一种结合重叠感知诊断输出的方法

论文标题

Dover-lap：一种结合重叠感知诊断输出的方法

DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

论文作者

Raj, Desh, Garcia-Perera, Leibny Paola, Huang, Zili, Watanabe, Shinji, Povey, Daniel, Stolcke, Andreas, Khudanpur, Sanjeev

论文摘要

最近，在处理演讲者诊断的重叠演讲方面已取得了一些进步。由于语音和自然语言任务通常受益于整体技术，因此我们提出了一种算法，用于通过多数投票组合此类诊断系统的产出。我们的方法Dover-Lap是从最近提出的Dover算法的灵感来源的，但旨在处理诊断输出中的重叠段。我们还修改了Dover中使用的配对增量标签映射策略，并根据加权K-Partite匹配提出了近似算法，该算法使用全局成本张量进行了此映射。我们通过在AMI和Librarics数据集上组合来自不同系统的输出（基于聚类的区域提案网络和宣传语音活动检测）来证明我们方法的强度，在该数据集中，它始终优于单个最佳系统。此外，我们表明，多佛 - 圈可用于多通道诊断的晚期融合，并与早期的融合方法进行了比较。

Several advances have been made recently towards handling overlapping speech for speaker diarization. Since speech and natural language tasks often benefit from ensemble techniques, we propose an algorithm for combining outputs from such diarization systems through majority voting. Our method, DOVER-Lap, is inspired from the recently proposed DOVER algorithm, but is designed to handle overlapping segments in diarization outputs. We also modify the pair-wise incremental label mapping strategy used in DOVER, and propose an approximation algorithm based on weighted k-partite graph matching, which performs this mapping using a global cost tensor. We demonstrate the strength of our method by combining outputs from diverse systems -- clustering-based, region proposal networks, and target-speaker voice activity detection -- on AMI and LibriCSS datasets, where it consistently outperforms the single best system. Additionally, we show that DOVER-Lap can be used for late fusion in multichannel diarization, and compares favorably with early fusion methods like beamforming.

下载PDF全文

下载文献需遵守相关版权规定

论文标题