通过自适应相关性的级联反复网络进行实用的立体声匹配

论文标题

通过自适应相关性的级联反复网络进行实用的立体声匹配

Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation

论文作者

Li, Jiankun, Wang, Peisen, Xiong, Pengfei, Cai, Tao, Yan, Ziwei, Yang, Lei, Liu, Jiangyu, Fan, Haoqiang, Liu, Shuaicheng

论文摘要

随着卷积神经网络的出现，立体声匹配算法最近取得了巨大进展。但是，由于诸如薄结构，非理想的整流，摄像机模块不一致和各种硬盘场景等实用的复杂因素，从消费者级设备等现实世界图像对中准确提取差异仍然是一个巨大的挑战。在本文中，我们提出了一系列创新设计，以解决实用的立体声匹配问题：1）为了更好地恢复精细的深度细节，我们设计了一个带有经常性完善的等级网络，以粗糙到细节的方式更新差异，以及堆叠的级联层叠式体系结构的结构； 2）我们提出了一个自适应组相关层，以减轻错误整流的影响； 3）我们引入了一个新的合成数据集，并特别注意困难的情况，以更好地推广到现实世界的场景。我们的结果不仅在Middlebury和ETH3D基准中排名第一，以明显的边距优于现有的最新方法，而且还表现出了现实照片的高质量细节，这清楚地表明了我们贡献的功效。

With the advent of convolutional neural networks, stereo matching algorithms have recently gained tremendous progress. However, it remains a great challenge to accurately extract disparities from real-world image pairs taken by consumer-level devices like smartphones, due to practical complicating factors such as thin structures, non-ideal rectification, camera module inconsistencies and various hard-case scenes. In this paper, we propose a set of innovative designs to tackle the problem of practical stereo matching: 1) to better recover fine depth details, we design a hierarchical network with recurrent refinement to update disparities in a coarse-to-fine manner, as well as a stacked cascaded architecture for inference; 2) we propose an adaptive group correlation layer to mitigate the impact of erroneous rectification; 3) we introduce a new synthetic dataset with special attention to difficult cases for better generalizing to real-world scenes. Our results not only rank 1st on both Middlebury and ETH3D benchmarks, outperforming existing state-of-the-art methods by a notable margin, but also exhibit high-quality details for real-life photos, which clearly demonstrates the efficacy of our contributions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题