使用基于后验的扬声器表示，在儿童语音中自动检测语音障碍

论文标题

使用基于后验的扬声器表示，在儿童语音中自动检测语音障碍

Automatic Detection of Speech Sound Disorder in Child Speech Using Posterior-based Speaker Representations

论文作者

Ng, Si-Ioi, Ng, Cymie Wing-Yee, Wang, Jiarui, Lee, Tan

论文摘要

本文提出了一种宏观的方法来自动检测儿童言语的言语声音障碍（SSD）。通常，SSD通过语言中特定音素的持续发音和语音错误表现出来。可以通过局部分析音素或儿童受试者引起的单词来检测到这种疾病。在本研究中，我们没有试图检测单个电话和单词级错误，而是建议从长长的话语中提取主题级表示，该言论是通过串联多个测试单词来构建的。扬声器验证方法和深神网络模型产生的后验特征用于得出各种类型的整体表示。线性分类器在正常培训中受到分化无序语音的训练。关于在讲广东话儿童中检测SSD的任务，实验结果表明，所提出的方法比以前的方法提高了检测性能的改进，该方法需要融合电话级检测结果。利用发音后的特征来从多词的话语中得出I-向量，从而获得了78.2％的未加权平均召回率，而宏F1得分为78.0％。

This paper presents a macroscopic approach to automatic detection of speech sound disorder (SSD) in child speech. Typically, SSD is manifested by persistent articulation and phonological errors on specific phonemes in the language. The disorder can be detected by focally analyzing the phonemes or the words elicited by the child subject. In the present study, instead of attempting to detect individual phone- and word-level errors, we propose to extract a subject-level representation from a long utterance that is constructed by concatenating multiple test words. The speaker verification approach, and posterior features generated by deep neural network models, are applied to derive various types of holistic representations. A linear classifier is trained to differentiate disordered speech in normal one. On the task of detecting SSD in Cantonese-speaking children, experimental results show that the proposed approach achieves improved detection performance over previous method that requires fusing phone-level detection results. Using articulatory posterior features to derive i-vectors from multiple-word utterances achieves an unweighted average recall of 78.2% and a macro F1 score of 78.0%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题