论文标题

“这是休斯顿。请再说一次。”。 Apollo-11无畏步骤挑战的行为系统挑战(第二阶段)

"This is Houston. Say again, please". The Behavox system for the Apollo-11 Fearless Steps Challenge (phase II)

论文作者

Gorin, Arseniy, Kulko, Daniil, Grima, Steven, Glasman, Alex

论文摘要

我们描述了语音活动检测(SAD),说话者诊断(SD)和自动语音识别(ASR)实验,由行为团队为Interspeech 2020 2020 Vearless Veartless挑战(FSC-2)进行。相对较少的标记数据,各种各样的扬声器和频道扭曲,特定的词典和口语样式导致涉及该数据的系统上的错误率很高。除了大约36个小时的NASA任务录音外,组织者还提供了更大但未标记的19k小时Apollo-11 Corpus,我们还探索了对ASR声学和语言模型的半监督培训,与仅在FSC-2数据上培训相比,相对单词错误率提高了17%以上。我们还比较了几个SAD和SD系统,以了解挑战最困难的轨道(腹泻和ASR的曲目1),其中为长达30分钟的音频记录提供了评估,而无需细分或说话者信息。对于所有系统,我们报告了与FSC-2基线系统相比的实质性提高,并在SD和ASR中获得了第一名的排名,而在挑战中,SAD的第四名。

We describe the speech activity detection (SAD), speaker diarization (SD), and automatic speech recognition (ASR) experiments conducted by the Behavox team for the Interspeech 2020 Fearless Steps Challenge (FSC-2). A relatively small amount of labeled data, a large variety of speakers and channel distortions, specific lexicon and speaking style resulted in high error rates on the systems which involved this data. In addition to approximately 36 hours of annotated NASA mission recordings, the organizers provided a much larger but unlabeled 19k hour Apollo-11 corpus that we also explore for semi-supervised training of ASR acoustic and language models, observing more than 17% relative word error rate improvement compared to training on the FSC-2 data only. We also compare several SAD and SD systems to approach the most difficult tracks of the challenge (track 1 for diarization and ASR), where long 30-minute audio recordings are provided for evaluation without segmentation or speaker information. For all systems, we report substantial performance improvements compared to the FSC-2 baseline systems, and achieved a first-place ranking for SD and ASR and fourth-place for SAD in the challenge.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源