论文标题
从图像收集中学习的观点学习
Self-Supervised Viewpoint Learning From Image Collections
论文作者
论文摘要
培训深度神经网络以估计物体的观点需要大量标记的培训数据集。但是,众所周知,手动标记观点很难,容易出错且耗时。另一方面,从互联网(例如汽车或面孔)开采许多对象类别的未标记图像相对容易。我们试图回答研究问题,即是否可以成功地利用这种无标记的野外图像收集来训练一般对象类别的观点估计网络,纯粹是通过自我选择的。自我安排在这里是指网络拥有的唯一真正的监督信号是输入图像本身。我们提出了一个新颖的学习框架,该框架结合了一个分析范式,以通过生成网络以及对称性和对抗性约束以成功地监督我们的观点估计网络,以视点意识的方式重建图像。我们表明,我们的方法竞争性地针对人脸,汽车,公共汽车和火车等几个对象类别的完全监督方法。我们的工作为自我监督的观点学习开辟了进一步的研究,并为其提供了强大的基准。我们在https://github.com/nvlabs/ssv上开放代码。
Training deep neural networks to estimate the viewpoint of objects requires large labeled training datasets. However, manually labeling viewpoints is notoriously hard, error-prone, and time-consuming. On the other hand, it is relatively easy to mine many unlabelled images of an object category from the internet, e.g., of cars or faces. We seek to answer the research question of whether such unlabeled collections of in-the-wild images can be successfully utilized to train viewpoint estimation networks for general object categories purely via self-supervision. Self-supervision here refers to the fact that the only true supervisory signal that the network has is the input image itself. We propose a novel learning framework which incorporates an analysis-by-synthesis paradigm to reconstruct images in a viewpoint aware manner with a generative network, along with symmetry and adversarial constraints to successfully supervise our viewpoint estimation network. We show that our approach performs competitively to fully-supervised approaches for several object categories like human faces, cars, buses, and trains. Our work opens up further research in self-supervised viewpoint learning and serves as a robust baseline for it. We open-source our code at https://github.com/NVlabs/SSV.