PIX2VOX ++：单个和多个图像的多尺度上下文感知3D对象重建

论文标题

PIX2VOX ++：单个和多个图像的多尺度上下文感知3D对象重建

Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images

论文作者

Xie, Haozhe, Yao, Hongxun, Zhang, Shengping, Zhou, Shangchen, Sun, Wenxiu

论文摘要

在过去几年中，从具有深神经网络的单个或多个图像中恢复对象的3D形状一直在吸引越来越多的关注。主流作品（例如3D-R2N2）使用复发性神经网络（RNN）来依次融合输入图像的特征图。但是，当给出具有不同订单的相同输入图像时，基于RNN的方法无法产生一致的重建结果。此外，由于长期记忆丧失，RNN可能会忘记早期输入图像的重要特征。为了解决这些问题，我们为单视图和多视图3D对象重建的新框架提出了一个新的框架，称为Pix2Vox ++。通过使用精心设计的编码器编码器，它可以从每个输入图像中生成粗3D卷。然后引入多尺度的上下文感知融合模块，以适应从所有粗3D卷中的不同部分的高质量重建，以获得融合的3D体积。为了进一步纠正Fused 3D体积中错误恢复的零件，采用炼油厂来生成最终输出。 Shapenet，Pix3D和Things 3D基准的实验结果表明，就准确性和效率而言，Pix2Vox ++对最新方法的表现均优惠。

Recovering the 3D shape of an object from single or multiple images with deep neural networks has been attracting increasing attention in the past few years. Mainstream works (e.g. 3D-R2N2) use recurrent neural networks (RNNs) to sequentially fuse feature maps of input images. However, RNN-based approaches are unable to produce consistent reconstruction results when given the same input images with different orders. Moreover, RNNs may forget important features from early input images due to long-term memory loss. To address these issues, we propose a novel framework for single-view and multi-view 3D object reconstruction, named Pix2Vox++. By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image. A multi-scale context-aware fusion module is then introduced to adaptively select high-quality reconstructions for different parts from all coarse 3D volumes to obtain a fused 3D volume. To further correct the wrongly recovered parts in the fused 3D volume, a refiner is adopted to generate the final output. Experimental results on the ShapeNet, Pix3D, and Things3D benchmarks show that Pix2Vox++ performs favorably against state-of-the-art methods in terms of both accuracy and efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题