面对人脸的度量重建

论文标题

面对人脸的度量重建

Towards Metrical Reconstruction of Human Faces

论文作者

Zielonka, Wojciech, Bolkart, Timo, Thies, Justus

论文摘要

面部重建和跟踪是AR/VR，人机相互作用以及医疗应用中众多应用的基础。这些应用中的大多数依赖于对形状的指标正确预测，尤其是当将重建的主体放入度量上下文中（即，当有已知大小的参考对象时）。对于测量受试者的距离和尺寸的任何应用，也需要进行度量重建（例如，几乎适合玻璃框架）。来自单个图像的面部重建的最新方法以自我监督的方式在大型2D图像数据集上进行训练。但是，由于透视投影的性质，它们无法重建实际的面部维度，甚至可以预测普通人的面孔在度量方面优于某些方法。为了学习面部的实际形状，我们主张有监督的培训计划。由于没有用于此任务的大规模3D数据集，因此我们注释和统一的中小型数据库。由此产生的统一数据集仍然是一个中等规模的数据集，具有超过2K的身份，并且纯粹在其上进行培训会导致过度拟合。为此，我们利用了在大规模2D图像数据集上预测的面部识别网络，该网络为不同的面部提供了不同的功能，并且对表达，照明和相机更改非常强大。使用这些功能，我们以监督的方式训练面部形状估计器，从而继承了面部识别网络的稳健性和概括。我们称之为云母（公制面）的方法在当前的非计算基准和我们的度量基准上都超过了最先进的重建方法（分别为15％和24％的平均误差，分别为15％和24％）。

Face reconstruction and tracking is a building block of numerous applications in AR/VR, human-machine interaction, as well as medical applications. Most of these applications rely on a metrically correct prediction of the shape, especially, when the reconstructed subject is put into a metrical context (i.e., when there is a reference object of known size). A metrical reconstruction is also needed for any application that measures distances and dimensions of the subject (e.g., to virtually fit a glasses frame). State-of-the-art methods for face reconstruction from a single image are trained on large 2D image datasets in a self-supervised fashion. However, due to the nature of a perspective projection they are not able to reconstruct the actual face dimensions, and even predicting the average human face outperforms some of these methods in a metrical sense. To learn the actual shape of a face, we argue for a supervised training scheme. Since there exists no large-scale 3D dataset for this task, we annotated and unified small- and medium-scale databases. The resulting unified dataset is still a medium-scale dataset with more than 2k identities and training purely on it would lead to overfitting. To this end, we take advantage of a face recognition network pretrained on a large-scale 2D image dataset, which provides distinct features for different faces and is robust to expression, illumination, and camera changes. Using these features, we train our face shape estimator in a supervised fashion, inheriting the robustness and generalization of the face recognition network. Our method, which we call MICA (MetrIC fAce), outperforms the state-of-the-art reconstruction methods by a large margin, both on current non-metric benchmarks as well as on our metric benchmarks (15% and 24% lower average error on NoW, respectively).

下载PDF全文

下载文献需遵守相关版权规定

论文标题