论文标题
3D随机遮挡和多层投影,用于深度摄像机的本地化
3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera Pedestrian Localization
论文作者
论文摘要
尽管基于深度学习的单眼探测方法已经取得了长足的进步,但它们仍然容易受到沉重的阻塞。使用多视图信息融合是一个潜在的解决方案,但由于缺乏注释的培训样本在现有的多视图数据集中,因此应用程序有限,这增加了过度拟合的风险。为了解决这个问题,提出了一种数据增强方法,以随机生成3D圆柱体阻塞的地面平面,该缸的平均大小为行人的平均大小,并预测了多种视图,以减轻训练过度拟合的影响。此外,每个视图的特征映射都通过使用同符,将每个视图的特征图投影到不同高度的多个平行平面,这允许CNN可以充分利用每个行人高度上的特征来推断地面上行人的位置。与最先进的基于深度学习的方法相比,提出的3Drom方法具有大大提高的性能。
Although deep-learning based methods for monocular pedestrian detection have made great progress, they are still vulnerable to heavy occlusions. Using multi-view information fusion is a potential solution but has limited applications, due to the lack of annotated training samples in existing multi-view datasets, which increases the risk of overfitting. To address this problem, a data augmentation method is proposed to randomly generate 3D cylinder occlusions, on the ground plane, which are of the average size of pedestrians and projected to multiple views, to relieve the impact of overfitting in the training. Moreover, the feature map of each view is projected to multiple parallel planes at different heights, by using homographies, which allows the CNNs to fully utilize the features across the height of each pedestrian to infer the locations of pedestrians on the ground plane. The proposed 3DROM method has a greatly improved performance in comparison with the state-of-the-art deep-learning based methods for multi-view pedestrian detection.