论文标题

关于将社会演讲者特征纳入合成语音

On incorporating social speaker characteristics in synthetic speech

论文作者

Rallabandi, Sai Sirisha, Möller, Sebastian

论文摘要

在我们以前的工作中,我们得出了声学特征,这些特征有助于对合成语音的温暖和能力感知。作为扩展,在我们当前的工作中,我们研究了派生的人声特征对所需特征产生的影响。探索了声学特征,光谱通量,F1平均值和F2平均值及其凸组合,以产生女性语音的更高温暖。研究了声音坡度,光谱通量及其凸组合,以产生女性言语的较高能力。我们在基于传统的端到端TACOTRON语音合成模型中采用了特征量化方法。听力测试表明,与单个特征相比,声学特征的凸组合表现出更高的均值温暖和能力得分。

In our previous work, we derived the acoustic features, that contribute to the perception of warmth and competence in synthetic speech. As an extension, in our current work, we investigate the impact of the derived vocal features in the generation of the desired characteristics. The acoustic features, spectral flux, F1 mean and F2 mean and their convex combinations were explored for the generation of higher warmth in female speech. The voiced slope, spectral flux, and their convex combinations were investigated for the generation of higher competence in female speech. We have employed a feature quantization approach in the traditional end-to-end tacotron based speech synthesis model. The listening tests have shown that the convex combination of acoustic features displays higher Mean Opinion Scores of warmth and competence when compared to that of individual features.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源