论文标题
重新思考多视图立体声的深度估计:统一表示
Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation
论文作者
论文摘要
深度估计被解决为现有基于学习的多视图立体方法中的回归或分类问题。尽管这两种表示最近表现出了出色的表现,但由于间接学习成本量,它们仍然存在明显的缺点,例如,回归方法往往过于拟合,并且分类方法无法直接推断出由于其离散预测而导致的确切深度。在本文中,我们提出了一种新颖的表示,称为统一,以统一回归和分类的优势。它可以直接限制成本量之类的分类方法,但也可以实现子像素深度预测,例如回归方法。为了发掘统一的潜力,我们设计了一种名为“统一局灶性损失”的新损失函数,这更统一和合理,可以打击样品不平衡的挑战。结合了这两个不承担的模块,我们提出了一个粗到最新的框架,我们称之为Unimvsnet。在DTU和储罐和神庙基准中排名第一的结果证明了我们的模型不仅表现最好,而且具有最佳的概括能力。
Depth estimation is solved as a regression or classification problem in existing learning-based multi-view stereo methods. Although these two representations have recently demonstrated their excellent performance, they still have apparent shortcomings, e.g., regression methods tend to overfit due to the indirect learning cost volume, and classification methods cannot directly infer the exact depth due to its discrete prediction. In this paper, we propose a novel representation, termed Unification, to unify the advantages of regression and classification. It can directly constrain the cost volume like classification methods, but also realize the sub-pixel depth prediction like regression methods. To excavate the potential of unification, we design a new loss function named Unified Focal Loss, which is more uniform and reasonable to combat the challenge of sample imbalance. Combining these two unburdened modules, we present a coarse-to-fine framework, that we call UniMVSNet. The results of ranking first on both DTU and Tanks and Temples benchmarks verify that our model not only performs the best but also has the best generalization ability.