语音驱动的面部动画的几何引导的密集透视网络

论文标题

语音驱动的面部动画的几何引导的密集透视网络

Geometry-guided Dense Perspective Network for Speech-Driven Facial Animation

论文作者

Liu, Jingying, Hui, Binyuan, Li, Kun, Liu, Yunke, Lai, Yu-Kun, Zhang, Yuxiang, Liu, Yebin, Yang, Jingyu

论文摘要

由于语音与面部之间的复杂关系，现实的语音驱动3D面部动画是一个具有挑战性的问题。在本文中，我们提出了一个深厚的建筑，称为几何引导密集的透视网络（GDPNET），以实现与说话者无关的现实3D面部动画。编码器的设计具有密集的连接，以增强特征传播并鼓励重新使用音频功能，并且将解码器与注意机制集成在一起，以通过对不同神经元单元之间的相互依赖进行显式建模，以适应重新校准点的特征响应。我们还引入了非线性面部重建表示形式，作为潜在空间的指导，以获得更准确的变形，这有助于解决与几何相关的变形，对跨受试者的概括有益。 Huber and HSIC（Hilbert-Schmidt独立标准）的限制被采用以促进我们的模型的鲁棒性，并更好地利用非线性和高级相关性。与最新模型相比，公共数据集和实际扫描数据集的实验结果验证了我们所提出的GDPNET的优势。

Realistic speech-driven 3D facial animation is a challenging problem due to the complex relationship between speech and face. In this paper, we propose a deep architecture, called Geometry-guided Dense Perspective Network (GDPnet), to achieve speaker-independent realistic 3D facial animation. The encoder is designed with dense connections to strengthen feature propagation and encourage the re-use of audio features, and the decoder is integrated with an attention mechanism to adaptively recalibrate point-wise feature responses by explicitly modeling interdependencies between different neuron units. We also introduce a non-linear face reconstruction representation as a guidance of latent space to obtain more accurate deformation, which helps solve the geometry-related deformation and is good for generalization across subjects. Huber and HSIC (Hilbert-Schmidt Independence Criterion) constraints are adopted to promote the robustness of our model and to better exploit the non-linear and high-order correlations. Experimental results on the public dataset and real scanned dataset validate the superiority of our proposed GDPnet compared with state-of-the-art model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题