论文标题
vis:使用语义描述在视频监视中检索的人检索
PeR-ViS: Person Retrieval in Video Surveillance using Semantic Description
论文作者
论文摘要
一个人通常以年龄,性别,身高,布类型,图案,颜色等等描述符为特征。这些描述符被称为属性和/或柔软的生物测量法。他们将一个人的描述和视频监视检索之间的语义差距联系起来。检索具有语义描述查询的特定人员在视频监视中具有重要的应用。利用计算机愿景充分自动化人员的检索任务一直在研究社区中引起人们的兴趣。但是,当前的趋势主要集中于检索基于图像的查询的人,这些查询对实际使用有重大限制。我们没有在本文中使用图像查询,而是研究了人物在视频监视中的检索问题,并使用语义描述。为了解决这个问题,我们开发了一种基于深度学习的级联滤波方法(每VIS),该方法使用Mask R-CNN [14](人检测和实例分段)和Densenet-161 [16](软性分类)。在SoftBiosearch [6]的标准人员检索数据集上,我们达到了0.566的平均值和0.792%w $ iou> 0.4 $,超过了当前最新的利润率。我们希望我们的简单,可重现和有效的方法能够帮助您在视频监视中检索人的未来研究。源代码和预估计的权重,可在https://parshwa1999.github.io/per-vis/上获得。
A person is usually characterized by descriptors like age, gender, height, cloth type, pattern, color, etc. Such descriptors are known as attributes and/or soft-biometrics. They link the semantic gap between a person's description and retrieval in video surveillance. Retrieving a specific person with the query of semantic description has an important application in video surveillance. Using computer vision to fully automate the person retrieval task has been gathering interest within the research community. However, the Current, trend mainly focuses on retrieving persons with image-based queries, which have major limitations for practical usage. Instead of using an image query, in this paper, we study the problem of person retrieval in video surveillance with a semantic description. To solve this problem, we develop a deep learning-based cascade filtering approach (PeR-ViS), which uses Mask R-CNN [14] (person detection and instance segmentation) and DenseNet-161 [16] (soft-biometric classification). On the standard person retrieval dataset of SoftBioSearch [6], we achieve 0.566 Average IoU and 0.792 %w $IoU > 0.4$, surpassing the current state-of-the-art by a large margin. We hope our simple, reproducible, and effective approach will help ease future research in the domain of person retrieval in video surveillance. The source code and pretrained weights available at https://parshwa1999.github.io/PeR-ViS/.