论文标题

不可行的功能障碍很少出现 - 检测是一个多标签问题

Dysfluencies Seldom Come Alone -- Detection as a Multi-Label Problem

论文作者

Bayerl, Sebastian P., Wagner, Dominik, Hönig, Florian, Bocklet, Tobias, Nöth, Elmar, Riedhammer, Korbinian

论文摘要

特别适应的语音识别模型对于处理口吃的语音是必要的。要以目标方式使用这些,必须可靠地检测到口吃的语音。最近的作品将口吃视为多类分类问题,或者将发现每种失去障碍类型视为孤立的任务。这并不能捕捉口吃的本质,在这种情况下很少独自一人,即与他人共同占领。这项工作探讨了一种基于修改后的WAV2VEC 2.0系统,用于端到端口吃检测和分类作为多标签问题。该方法是根据包含英语和德语口吃语音的三个数据集的组合进行评估的,从而产生了最先进的结果,可在Sep-28k扩展的数据集上进行口吃检测。实验结果为特征的可传递性以及该方法跨数据集和语言的普遍性提供了证据。

Specially adapted speech recognition models are necessary to handle stuttered speech. For these to be used in a targeted manner, stuttered speech must be reliably detected. Recent works have treated stuttering as a multi-class classification problem or viewed detecting each dysfluency type as an isolated task; that does not capture the nature of stuttering, where one dysfluency seldom comes alone, i.e., co-occurs with others. This work explores an approach based on a modified wav2vec 2.0 system for end-to-end stuttering detection and classification as a multi-label problem. The method is evaluated on combinations of three datasets containing English and German stuttered speech, yielding state-of-the-art results for stuttering detection on the SEP-28k-Extended dataset. Experimental results provide evidence for the transferability of features and the generalizability of the method across datasets and languages.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源