用规范相关受限的自动编码器从语音波形中进行头部运动的预测

论文标题

用规范相关受限的自动编码器从语音波形中进行头部运动的预测

Prediction of head motion from speech waveforms with a canonical-correlation-constrained autoencoder

论文作者

Lu, JinHong, Shimodaira, Hiroshi

论文摘要

这项研究调查了语音波形直接使用语音驱动的头部运动综合的头部运动，而使用光谱特征（例如MFCC）作为基本输入特征以及其他功能（例如能量和F0）在文献中很常见。我们表明，使用直接预测相应头部运动的波形，而不是结合源自波形的不同特征，而是更有效的。基于波形的方法的挑战是，波形包含大量与预测头部运动无关的信息，这阻碍了神经网络的训练。为了克服问题，我们提出了一个规范相关约束的自动编码器（CCCAE），其中隐藏的图层经过训练，不仅可以最大程度地减少误差，还可以最大程度地与头部运动最大化规范相关性。与基于MFCC的系统相比，所提出的系统在客观评估中显示出可比的性能，并且在受试者评估中表现更好。

This study investigates the direct use of speech waveforms to predict head motion for speech-driven head-motion synthesis, whereas the use of spectral features such as MFCC as basic input features together with additional features such as energy and F0 is common in the literature. We show that, rather than combining different features that originate from waveforms, it is more effective to use waveforms directly predicting corresponding head motion. The challenge with the waveform-based approach is that waveforms contain a large amount of information irrelevant to predict head motion, which hinders the training of neural networks. To overcome the problem, we propose a canonical-correlation-constrained autoencoder (CCCAE), where hidden layers are trained to not only minimise the error but also maximise the canonical correlation with head motion. Compared with an MFCC-based system, the proposed system shows comparable performance in objective evaluation, and better performance in subject evaluation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题