通过深度训练DeDocus提示的单图形深度估计

论文标题

通过深度训练DeDocus提示的单图形深度估计

Single Image Depth Estimation Trained via Depth from Defocus Cues

论文作者

Gur, Shir, Wolf, Lior

论文摘要

从单个RGB图像中估算深度是计算机视觉中的一项基本任务，该任务最直接地使用监督的深度学习来解决。在从单个RGB图像中无监督的深度学习领域中，深度没有明确给出。该领域的现有作品会收到立体对，单眼视频或多个视图，并且使用基于结构上的损失，训练深度估计网络。在这项工作中，我们依靠焦点提示深度依靠而不是不同的观点。学习是基于新的点扩散函数卷积层，该卷卷层均采用特定位置的内核，这些内核是由每个图像位置中的连接圈引起的。我们对从五个常见数据集获得的数据进行评估，以进行深度估计和灯场图像，并提出与Kitti和Make3D数据集的监督方法相提并论的结果，以及优于无监督的学习方法。由于DyoCus的深度现象不是特定于数据集的，因此我们假设基于它的学习将使每个数据集中的特定内容过于拟合。我们的实验表明确实如此，并且使用我们的方法在一个数据集上学到的估计器比直接监督的方法在其他数据集上提供了更好的结果。

Estimating depth from a single RGB images is a fundamental task in computer vision, which is most directly solved using supervised deep learning. In the field of unsupervised learning of depth from a single RGB image, depth is not given explicitly. Existing work in the field receives either a stereo pair, a monocular video, or multiple views, and, using losses that are based on structure-from-motion, trains a depth estimation network. In this work, we rely, instead of different views, on depth from focus cues. Learning is based on a novel Point Spread Function convolutional layer, which applies location specific kernels that arise from the Circle-Of-Confusion in each image location. We evaluate our method on data derived from five common datasets for depth estimation and lightfield images, and present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches. Since the phenomenon of depth from defocus is not dataset specific, we hypothesize that learning based on it would overfit less to the specific content in each dataset. Our experiments show that this is indeed the case, and an estimator learned on one dataset using our method provides better results on other datasets, than the directly supervised methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题