论文标题
在拥挤的场景视频中迈向准确的人级行动识别
Toward Accurate Person-level Action Recognition in Videos of Crowded Scenes
论文作者
论文摘要
由于复杂的环境和多样性事件,在拥挤的场景视频中检测和认可人类行动是一个艰巨的问题。先前的作品总是在两个方面都无法处理此问题:(1)缺乏使用场景信息; (2)在人群和复杂场景中缺乏培训数据。在本文中,我们专注于通过充分利用场景信息并收集新数据来改善时空行动识别。自上而下的策略用于克服局限性。具体而言,我们采用强大的人类检测器来检测每个框架的空间位置。然后,我们采用动作识别模型来从HIE数据集的视频帧和新数据中学习具有不同场景的新数据,从而可以提高我们的模型的概括能力。此外,语义分割模型提取了场景信息以助理该过程。结果,我们的方法达到了平均26.05 wf \ _map(在ACM MM Grand Challenge 2020:人类中排名第一)。
Detecting and recognizing human action in videos with crowded scenes is a challenging problem due to the complex environment and diversity events. Prior works always fail to deal with this problem in two aspects: (1) lacking utilizing information of the scenes; (2) lacking training data in the crowd and complex scenes. In this paper, we focus on improving spatio-temporal action recognition by fully-utilizing the information of scenes and collecting new data. A top-down strategy is used to overcome the limitations. Specifically, we adopt a strong human detector to detect the spatial location of each frame. We then apply action recognition models to learn the spatio-temporal information from video frames on both the HIE dataset and new data with diverse scenes from the internet, which can improve the generalization ability of our model. Besides, the scenes information is extracted by the semantic segmentation model to assistant the process. As a result, our method achieved an average 26.05 wf\_mAP (ranking 1st place in the ACM MM grand challenge 2020: Human in Events).