3D对象识别的可变视图表示

论文标题

3D对象识别的可变视图表示

Variable-Viewpoint Representations for 3D Object Recognition

论文作者

Ma, Tengyu, Michelson, Joel, Ainooson, James, Sanyal, Deepayan, Wang, Xiaohan, Kunda, Maithilee

论文摘要

对于3D对象识别的问题，使用深度学习方法的研究人员开发了几种非常不同的输入表示形式，包括从对象周围的离散观点中获取的“多视图”快照，以及来自所有方向的对象射线跟踪样品的密集图。这些表示形式从捕获的对象信息以及捕获的细节程度上提供了权衡取舍，但是尚不清楚如何衡量这些信息权衡，因为两种类型的表示形式是如此不同。我们证明，两种类型的表示实际上都存在于共同表示连续体的两个极端，从本质上选择优先考虑对象的视图数量或像素（即视场）分配的每个视图。我们确定了有趣的中间表示，这些表示位于这两个极端之间的点，我们通过系统的经验实验表明，该连续体的准确性是输入信息的函数以及所使用的特定深度学习体系结构的变化。

For the problem of 3D object recognition, researchers using deep learning methods have developed several very different input representations, including "multi-view" snapshots taken from discrete viewpoints around an object, as well as "spherical" representations consisting of a dense map of essentially ray-traced samples of the object from all directions. These representations offer trade-offs in terms of what object information is captured and to what degree of detail it is captured, but it is not clear how to measure these information trade-offs since the two types of representations are so different. We demonstrate that both types of representations in fact exist at two extremes of a common representational continuum, essentially choosing to prioritize either the number of views of an object or the pixels (i.e., field of view) allotted per view. We identify interesting intermediate representations that lie at points in between these two extremes, and we show, through systematic empirical experiments, how accuracy varies along this continuum as a function of input information as well as the particular deep learning architecture that is used.

下载PDF全文

下载文献需遵守相关版权规定

论文标题