论文标题

传统和基于神经的低比特率声音器的语音质量因素

Speech Quality Factors for Traditional and Neural-Based Low Bit Rate Vocoders

论文作者

Jassim, Wissam A., Skoglund, Jan, Chinen, Michael, Hines, Andrew

论文摘要

这项研究比较了以低比特速率编码语音的不同算法的性能。除了广泛部署的传统声音编码外,还以不同的比特率选择了最近开发的基于生成模型的编码器。评估编码语音的性能分析,以了解不同质量方面:音高周期估计的准确性,自动语音识别的单词错误率以及说话者性别和编码延迟的影响。将来自公开可用数据库的语音样本的许多性能指标与主观分数进行了比较。主观质量评估的结果与现有的完整参考语音质量指标没有很好的相关性。结果为语音信号的各个方面提供了宝贵的见解,该方面将用于开发一种新颖的指标,以准确预测基于生成模型的编码器的语音质量。

This study compares the performances of different algorithms for coding speech at low bit rates. In addition to widely deployed traditional vocoders, a selection of recently developed generative-model-based coders at different bit rates are contrasted. Performance analysis of the coded speech is evaluated for different quality aspects: accuracy of pitch periods estimation, the word error rates for automatic speech recognition, and the influence of speaker gender and coding delays. A number of performance metrics of speech samples taken from a publicly available database were compared with subjective scores. Results from subjective quality assessment do not correlate well with existing full reference speech quality metrics. The results provide valuable insights into aspects of the speech signal that will be used to develop a novel metric to accurately predict speech quality from generative-model-based coders.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源