近似：移动的内容和竞争感知的近似对象检测

论文标题

近似：移动的内容和竞争感知的近似对象检测

ApproxDet: Content and Contention-Aware Approximate Object Detection for Mobiles

论文作者

Xu, Ran, Zhang, Chen-lin, Wang, Pengcheng, Lee, Jayoung, Mitra, Subrata, Chaterji, Somali, Li, Yin, Bagchi, Saurabh

论文摘要

高级视频分析系统，包括场景分类和对象检测，在智能城市和自动运输等各个领域都取得了广泛的成功。借助数量不断增长的强大客户端设备，有动力将这些重型视频分析工作负载从云移到移动设备，以实现低延迟和实时处理并保留用户隐私。但是，大多数视频分析系统都是重量级的，并且经过了一些预定义的延迟或准确性要求的离线训练。这使得他们无法在三种类型的动态时在运行时调整 - 输入视频特性会发生变化，由于共同定位的应用程序而引起的节点上可用的计算资源数量以及用户的延迟 - 准确性要求的变化。在本文中，我们介绍了ATCODET，这是一个适用于移动设备的自适应视频对象检测框架，面对不断变化的内容和资源争议方案，以满足精确的片段要求。为了实现这一目标，我们引入了一个多分支对象检测内核（在更快的R-CNN上分层），该内核在性能指标上包含了数据驱动的建模方法，并在运行时选择了潜伏期SLA驱动的调度程序，以选择最佳的执行分支。我们将此内核与近似的视频对象跟踪算法相结合，以创建一个端到端的视频对象检测系统。我们在大型基准视频数据集上评估了近似值，并与Adascale和Yolov3进行了定量比较。我们发现，近似值能够适应各种各样的争论和内容特征，并且超出了所有基础线，例如，它的潜伏期降低了52％，比Yolov3的准确性提高了11.1％。

Advanced video analytic systems, including scene classification and object detection, have seen widespread success in various domains such as smart cities and autonomous transportation. With an ever-growing number of powerful client devices, there is incentive to move these heavy video analytics workloads from the cloud to mobile devices to achieve low latency and real-time processing and to preserve user privacy. However, most video analytic systems are heavyweight and are trained offline with some pre-defined latency or accuracy requirements. This makes them unable to adapt at runtime in the face of three types of dynamism -- the input video characteristics change, the amount of compute resources available on the node changes due to co-located applications, and the user's latency-accuracy requirements change. In this paper we introduce ApproxDet, an adaptive video object detection framework for mobile devices to meet accuracy-latency requirements in the face of changing content and resource contention scenarios. To achieve this, we introduce a multi-branch object detection kernel (layered on Faster R-CNN), which incorporates a data-driven modeling approach on the performance metrics, and a latency SLA-driven scheduler to pick the best execution branch at runtime. We couple this kernel with approximable video object tracking algorithms to create an end-to-end video object detection system. We evaluate ApproxDet on a large benchmark video dataset and compare quantitatively to AdaScale and YOLOv3. We find that ApproxDet is able to adapt to a wide variety of contention and content characteristics and outshines all baselines, e.g., it achieves 52% lower latency and 11.1% higher accuracy over YOLOv3.

下载PDF全文

下载文献需遵守相关版权规定

论文标题