稀疏：蒸馏3D重建的视图条件扩散

论文标题

稀疏：蒸馏3D重建的视图条件扩散

SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction

论文作者

Zhou, Zhizhuo, Tulsiani, Shubham

论文摘要

我们提出了稀疏视图3D重建方法，该方法统一了神经渲染和概率图像产生的最新进展。现有方法通常以重新投影功能的神经渲染为基础，但在大型观点变化下未能产生看不见的区域或处理不确定性。替代方法将其视为（概率）2D合成任务，尽管它们可以生成合理的2D图像，但它们并没有推断出一致的基础3D。但是，我们发现3D一致性和概率图像产生之间的这种权衡并不需要存在。实际上，我们表明几何一致性和生成推断可以在寻求模式的行为中互补。通过将3D一致的场景表示从视图条件的潜扩散模型提炼出来，我们能够恢复一个合理的3D表示，其呈现既准确又逼真。我们在CO3D数据集中评估了51个类别的方法，并表明它在稀疏视图中均优于失真和感知指标中现有方法的方法。

We propose SparseFusion, a sparse view 3D reconstruction approach that unifies recent advances in neural rendering and probabilistic image generation. Existing approaches typically build on neural rendering with re-projected features but fail to generate unseen regions or handle uncertainty under large viewpoint changes. Alternate methods treat this as a (probabilistic) 2D synthesis task, and while they can generate plausible 2D images, they do not infer a consistent underlying 3D. However, we find that this trade-off between 3D consistency and probabilistic image generation does not need to exist. In fact, we show that geometric consistency and generative inference can be complementary in a mode-seeking behavior. By distilling a 3D consistent scene representation from a view-conditioned latent diffusion model, we are able to recover a plausible 3D representation whose renderings are both accurate and realistic. We evaluate our approach across 51 categories in the CO3D dataset and show that it outperforms existing methods, in both distortion and perception metrics, for sparse-view novel view synthesis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题