论文标题
L2英语语音的端到端错位检测系统利用新颖的反电话建模
An End-to-End Mispronunciation Detection System for L2 English Speech Leveraging Novel Anti-Phone Modeling
论文作者
论文摘要
错误发音检测和诊断(MDD)是计算机辅助发音训练(CAPT)的核心组成部分。大多数现有的MDD方法都集中在处理分类错误(即除了删除或插入引起的那些错误发音外,另一个经典手机都由另一张手机代替)。但是,对于非类别或失真误差的准确检测和诊断(即使用L1(第一语言)手机的L2手机近似L2手机,或者之间的错误发音)似乎仍然遥不可及。鉴于这一点,我们建议使用一种新型的最终末端自动语音识别(基于E2E的ASR)方法进行MDD。特别是,我们扩展了原始的L2手机套件,其相应的反电话套件,使基于E2E的MDD方法具有更好的能力,可以同时使用分类和非类别错误发音,旨在提供更好的错误听见检测和诊断反馈。此外,设计了一种新型的转移学习范式,以获得基于E2E的MDD系统的初始模型估计,而没有任何语音规则。 L2-极数据数据集上的一系列实验结果集表明,我们的最佳系统可以优于现有的基线系统和基于F1得分的基于E2E的基线系统和发音评分方法,分别为11.05%和27.71%。
Mispronunciation detection and diagnosis (MDD) is a core component of computer-assisted pronunciation training (CAPT). Most of the existing MDD approaches focus on dealing with categorical errors (viz. one canonical phone is substituted by another one, aside from those mispronunciations caused by deletions or insertions). However, accurate detection and diagnosis of non-categorial or distortion errors (viz. approximating L2 phones with L1 (first-language) phones, or erroneous pronunciations in between) still seems out of reach. In view of this, we propose to conduct MDD with a novel end- to-end automatic speech recognition (E2E-based ASR) approach. In particular, we expand the original L2 phone set with their corresponding anti-phone set, making the E2E-based MDD approach have a better capability to take in both categorical and non-categorial mispronunciations, aiming to provide better mispronunciation detection and diagnosis feedback. Furthermore, a novel transfer-learning paradigm is devised to obtain the initial model estimate of the E2E-based MDD system without resource to any phonological rules. Extensive sets of experimental results on the L2-ARCTIC dataset show that our best system can outperform the existing E2E baseline system and pronunciation scoring based method (GOP) in terms of the F1-score, by 11.05% and 27.71%, respectively.