全景：用于在城市环境中检测对象的自动注释

论文标题

全景：用于在城市环境中检测对象的自动注释

PanorAMS: Automatic Annotation for Detecting Objects in Urban Context

论文作者

Groenen, Inske, Rudinac, Stevan, Worring, Marcel

论文摘要

全球城市可免费获得大量的地理参考全景图像，以及各种各样的城市物体上的位置和元数据的详细地图。它们提供了有关城市物体的潜在信息来源，但是对象检测的手动注释是昂贵，费力和困难的。我们可以利用这种多媒体来源自动注释街道级图像作为手动标签的廉价替代品吗？使用Panorams框架，我们引入了一种方法，以根据城市上下文信息自动生成全景图像的边界框注释。遵循此方法，我们仅以快速自动的方式从开放数据源中获得了大规模的（尽管嘈杂），但对城市数据集进行了注释。该数据集涵盖了阿姆斯特丹市，其中包括771,299张全景图像中22个对象类别的1400万个嘈杂的边界框注释。对于许多对象，可以从地理空间元数据（例如建筑价值，功能和平均表面积）获得进一步的细粒信息。这样的信息将很难（即使不是不可能）单独根据图像通过手动标记来获取。为了进行详细评估，我们引入了一个有效的众包协议，用于在全景图像中进行边界框注释，我们将其部署以获取147,075个地面真实对象注释，用于7,348张图像的子集，即Panorams-Clean数据集。对于我们的Panorams-Noisy数据集，我们对噪声以及不同类型的噪声如何影响图像分类和对象检测性能提供了广泛的分析。我们在本文公开提供数据集，全景噪声和全景清洁，基准和工具。

Large collections of geo-referenced panoramic images are freely available for cities across the globe, as well as detailed maps with location and meta-data on a great variety of urban objects. They provide a potentially rich source of information on urban objects, but manual annotation for object detection is costly, laborious and difficult. Can we utilize such multimedia sources to automatically annotate street level images as an inexpensive alternative to manual labeling? With the PanorAMS framework we introduce a method to automatically generate bounding box annotations for panoramic images based on urban context information. Following this method, we acquire large-scale, albeit noisy, annotations for an urban dataset solely from open data sources in a fast and automatic manner. The dataset covers the City of Amsterdam and includes over 14 million noisy bounding box annotations of 22 object categories present in 771,299 panoramic images. For many objects further fine-grained information is available, obtained from geospatial meta-data, such as building value, function and average surface area. Such information would have been difficult, if not impossible, to acquire via manual labeling based on the image alone. For detailed evaluation, we introduce an efficient crowdsourcing protocol for bounding box annotations in panoramic images, which we deploy to acquire 147,075 ground-truth object annotations for a subset of 7,348 images, the PanorAMS-clean dataset. For our PanorAMS-noisy dataset, we provide an extensive analysis of the noise and how different types of noise affect image classification and object detection performance. We make both datasets, PanorAMS-noisy and PanorAMS-clean, benchmarks and tools presented in this paper openly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题