秘密来源：合并源特征以改善声学到公告的语音反演

论文标题

秘密来源：合并源特征以改善声学到公告的语音反演

The Secret Source : Incorporating Source Features to Improve Acoustic-to-Articulatory Speech Inversion

论文作者

Siriwardena, Yashish M., Espy-Wilson, Carol

论文摘要

在这项工作中，我们结合了声学源的源特征，周期性，周期性和音高，作为声音到发音语音反演（SI）系统的其他目标。我们还提出了一个基于时间卷积的SI系统，该系统使用听觉谱图作为输入语音表示，以学习远程依赖关系和源和声带之间的复杂相互作用，以改善SI任务。实验是使用威斯康星州X射线微束（XRMB）和Haskins生产率比较（HPRC）数据集进行的，并且相对于三个基线SI模型架构进行了比较。当使用源特征用作附加目标时，带有HPRC数据集的拟议的SI系统将获得近28％的改善。在XRMB数据集上，同一SI系统的表现优于当前最佳性能SI模型。

In this work, we incorporated acoustically derived source features, aperiodicity, periodicity and pitch as additional targets to an acoustic-to-articulatory speech inversion (SI) system. We also propose a Temporal Convolution based SI system, which uses auditory spectrograms as the input speech representation, to learn long-range dependencies and complex interactions between the source and vocal tract, to improve the SI task. The experiments are conducted with both the Wisconsin X-ray microbeam (XRMB) and Haskins Production Rate Comparison (HPRC) datasets, with comparisons done with respect to three baseline SI model architectures. The proposed SI system with the HPRC dataset gains an improvement of close to 28% when the source features are used as additional targets. The same SI system outperforms the current best performing SI models by around 9% on the XRMB dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题