学习类别级别6D对象姿势和尺寸估计的规范形状空间

论文标题

学习类别级别6D对象姿势和尺寸估计的规范形状空间

Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation

论文作者

Chen, Dengsheng, Li, Jun, Wang, Zheng, Xu, Kai

论文摘要

我们为类别级别6D对象姿势和尺寸估计提出了一种新颖的方法。为了解决类内形状的变化，我们学习了规范形状空间（CASS），这是针对特定对象类别的各种实例的统一表示形式。特别是，Cass被建模为具有归一化姿势的规范3D形状的深生成模型的潜在空间。我们训练一个差异自动编码器（VAE），用于从RGBD图像中生成在规范空间中的3D点云。 VAE以跨类别方式进行了培训，利用了公开可用的大型3D形状存储库。由于3D点云是在归一化姿势（实际尺寸）中生成的，因此VAE的编码器学习了视图的RGBD嵌入。它将任意视图中的RGBD图像映射到姿势无关的3D形状表示中。然后，通过将其与单独的深神经网络提取的输入RGBD的姿势相关特征进行对比来估算对象姿势。我们将CASS和姿势和大小估计的学习整合到一个端到端的可训练网络中，从而实现了最先进的性能。

We present a novel approach to category-level 6D object pose and size estimation. To tackle intra-class shape variations, we learn canonical shape space (CASS), a unified representation for a large variety of instances of a certain object category. In particular, CASS is modeled as the latent space of a deep generative model of canonical 3D shapes with normalized pose. We train a variational auto-encoder (VAE) for generating 3D point clouds in the canonical space from an RGBD image. The VAE is trained in a cross-category fashion, exploiting the publicly available large 3D shape repositories. Since the 3D point cloud is generated in normalized pose (with actual size), the encoder of the VAE learns view-factorized RGBD embedding. It maps an RGBD image in arbitrary view into a pose-independent 3D shape representation. Object pose is then estimated via contrasting it with a pose-dependent feature of the input RGBD extracted with a separate deep neural networks. We integrate the learning of CASS and pose and size estimation into an end-to-end trainable network, achieving the state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题