旨在编码人类和机器视觉：可扩展的图像编码方法

论文标题

旨在编码人类和机器视觉：可扩展的图像编码方法

Towards Coding for Human and Machine Vision: A Scalable Image Coding Approach

论文作者

Hu, Yueyu, Yang, Shuai, Yang, Wenhan, Duan, Ling-Yu, Liu, Jiaying

论文摘要

过去几十年来，大数据时代见证了图像和视频编码技术的快速发展。但是，信号保真度驱动的编码管道设计限制了现有图像/视频编码框架的能力，以满足机器和人类视觉的需求。在本文中，我们通过利用压缩模型和生成模型来提出一个新颖的图像编码框架，以共同支持机器视觉和人类的感知任务。给定输入图像，首先应用特征分析，然后采用生成模型来执行图像重建，并在此工作中提取紧凑的边缘图以以可扩展的方式连接两种视觉。紧凑型边缘图是机器视觉任务的基本层，并且参考像素充当一种增强层，以确保人类视力的信号信号。通过引入高级生成模型，我们训练一个灵活的网络，从紧凑的特征表示和参考像素重建图像。实验结果证明了我们在人类视觉质量和面部地标检测中的框架的优势，这些检测提供了有关MPEG VCM（机器视频编码）的新兴标准化工作的有用证据。

The past decades have witnessed the rapid development of image and video coding techniques in the era of big data. However, the signal fidelity-driven coding pipeline design limits the capability of the existing image/video coding frameworks to fulfill the needs of both machine and human vision. In this paper, we come up with a novel image coding framework by leveraging both the compressive and the generative models, to support machine vision and human perception tasks jointly. Given an input image, the feature analysis is first applied, and then the generative model is employed to perform image reconstruction with features and additional reference pixels, in which compact edge maps are extracted in this work to connect both kinds of vision in a scalable way. The compact edge map serves as the basic layer for machine vision tasks, and the reference pixels act as a sort of enhanced layer to guarantee signal fidelity for human vision. By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels. Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection, which provide useful evidence on the emerging standardization efforts on MPEG VCM (Video Coding for Machine).

下载PDF全文

下载文献需遵守相关版权规定

论文标题