论文标题

部分可观测时空混沌系统的无模型预测

Robust Phonetic Segmentation Using Spectral Transition measure for Non-Standard Recording Environments

论文作者

Vachhani, Bhavik, Bhat, Chitralekha, Kopparapu, Sunil

论文摘要

电话级别的误解本地化是自动发音错误评估系统的关键要求。强大的电话细分技术对于有助于对电话级别的语音级别进行实时评估,其中音频记录在手机或平板电脑上。这是一个非标准的录音设置,几乎无法控制录制质量。我们提出了一种新颖的后处理技术,以帮助频谱过渡措施(STM)在嘈杂条件(例如环境噪音和剪辑)下进行的电话细分,通常在手机录制期间存在。显示了使用传统的MFCC和PLPCC语音功能来比较我们的方法和电话分割的性能,以进行高斯噪声和剪辑。提出的方法已在Timit和印地语语音语料库上进行了验证,并用于计算一组语音的电话边界,并在三台设备上同时记录在三个设备上 - 笔记本电脑,一个位置的平板电脑和手持手机,以模拟实时非标准录音环境中的不同音频质量。 F-Ratio是用于计算电话边界标记中精度的度量。实验结果表明,与基线方法相比,TIMIT的提高了7%,印地语数据的提高了10%。在内部收集的三个录音集中也看到了类似的结果。

Phone level localization of mis-articulation is a key requirement for an automatic articulation error assessment system. A robust phone segmentation technique is essential to aid in real-time assessment of phone level mis-articulations of speech, wherein the audio is recorded on mobile phones or tablets. This is a non-standard recording set-up with little control over the quality of recording. We propose a novel post processing technique to aid Spectral Transition Measure(STM)-based phone segmentation under noisy conditions such as environment noise and clipping, commonly present during a mobile phone recording. A comparison of the performance of our approach and phone segmentation using traditional MFCC and PLPCC speech features for Gaussian noise and clipping is shown. The proposed approach was validated on TIMIT and Hindi speech corpus and was used to compute phone boundaries for a set of speech, recorded simultaneously on three devices - a laptop, a stationarily placed tablet and a handheld mobile phone, to simulate different audio qualities in a real-time non-standard recording environment. F-ratio was the metric used to compute the accuracy in phone boundary marking. Experimental results show an improvement of 7% for TIMIT and 10% for Hindi data over the baseline approach. Similar results were seen for the set of three of recordings collected in-house.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源