论文标题

语音驱动的面部动画的几何引导的密集透视网络

Geometry-guided Dense Perspective Network for Speech-Driven Facial Animation

论文作者

Liu, Jingying, Hui, Binyuan, Li, Kun, Liu, Yunke, Lai, Yu-Kun, Zhang, Yuxiang, Liu, Yebin, Yang, Jingyu

论文摘要

由于语音与面部之间的复杂关系,现实的语音驱动3D面部动画是一个具有挑战性的问题。在本文中,我们提出了一个深厚的建筑,称为几何引导密集的透视网络(GDPNET),以实现与说话者无关的现实3D面部动画。编码器的设计具有密集的连接,以增强特征传播并鼓励重新使用音频功能,并且将解码器与注意机制集成在一起,以通过对不同神经元单元之间的相互依赖进行显式建模,以适应重新校准点的特征响应。我们还引入了非线性面部重建表示形式,作为潜在空间的指导,以获得更准确的变形,这有助于解决与几何相关的变形,对跨受试者的概括有益。 Huber and HSIC(Hilbert-Schmidt独立标准)的限制被采用以促进我们的模型的鲁棒性,并更好地利用非线性和高级相关性。与最新模型相比,公共数据集和实际扫描数据集的实验结果验证了我们所提出的GDPNET的优势。

Realistic speech-driven 3D facial animation is a challenging problem due to the complex relationship between speech and face. In this paper, we propose a deep architecture, called Geometry-guided Dense Perspective Network (GDPnet), to achieve speaker-independent realistic 3D facial animation. The encoder is designed with dense connections to strengthen feature propagation and encourage the re-use of audio features, and the decoder is integrated with an attention mechanism to adaptively recalibrate point-wise feature responses by explicitly modeling interdependencies between different neuron units. We also introduce a non-linear face reconstruction representation as a guidance of latent space to obtain more accurate deformation, which helps solve the geometry-related deformation and is good for generalization across subjects. Huber and HSIC (Hilbert-Schmidt Independence Criterion) constraints are adopted to promote the robustness of our model and to better exploit the non-linear and high-order correlations. Experimental results on the public dataset and real scanned dataset validate the superiority of our proposed GDPnet compared with state-of-the-art model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源