论文标题
使用图神经网络的多相机躯干姿势估计
Multi-camera Torso Pose Estimation using Graph Neural Networks
论文作者
论文摘要
估计人类的位置和方向是服务和辅助机器人的重要技能。为了在公寓等广泛区域中实现可靠的估计,经常使用多个RGBD摄像机。首先,这些设置相对昂贵。其次,他们很少在处理管道的早期使用多个摄像头来执行有效的数据融合。在这些情况下,遮挡和部分视图使得第二点非常相关。本文提出的建议使用图形神经网络从多个相机来源合并了所获取的信息,在该位置达到125 mm以下的平均绝对误差,使用低分辨率RGB图像的方向达到10度。该实验在一个带有三个摄像机的公寓中进行,基于两个不同的图神经网络实现和基于完全连接的层的第三个体系结构。所使用的软件已作为公共存储库(https://github.com/vangiel/wheresthefellow)发行。
Estimating the location and orientation of humans is an essential skill for service and assistive robots. To achieve a reliable estimation in a wide area such as an apartment, multiple RGBD cameras are frequently used. Firstly, these setups are relatively expensive. Secondly, they seldom perform an effective data fusion using the multiple camera sources at an early stage of the processing pipeline. Occlusions and partial views make this second point very relevant in these scenarios. The proposal presented in this paper makes use of graph neural networks to merge the information acquired from multiple camera sources, achieving a mean absolute error below 125 mm for the location and 10 degrees for the orientation using low-resolution RGB images. The experiments, conducted in an apartment with three cameras, benchmarked two different graph neural network implementations and a third architecture based on fully connected layers. The software used has been released as open-source in a public repository (https://github.com/vangiel/WheresTheFellow).