Pøda：及时驱动的零射击域的适应

论文标题

Pøda：及时驱动的零射击域的适应

PØDA: Prompt-driven Zero-shot Domain Adaptation

论文作者

Fahes, Mohammad, Vu, Tuan-Hung, Bursuc, Andrei, Pérez, Patrick, de Charette, Raoul

论文摘要

在计算机视觉中已经对域的适应性进行了大量研究，但仍需要在火车时间访问目标图像，这在某些罕见的条件下可能是棘手的。在本文中，我们提出了“迅速驱动的零弹药域适应”的任务，在该任务中，我们仅使用目标域的自然语言中的一般描述（即提示）调整了在源域上训练的模型。首先，我们利用预处理的对比视觉模型（剪辑）来优化源特征的仿射转换，将其转向目标文本嵌入，同时保留其内容和语义。为了实现这一目标，我们提出了及时驱动的实例归一化（PIN）。其次，我们表明这些及时驱动的增强可用于执行零弹药域的适应性，以进行语义分割。实验表明，我们的方法在几个数据集上大大优于基于夹的样式传输基线，用于下游任务，甚至超过了一声无监督的域的适应性。在对象检测和图像分类上观察到类似的提升。该代码可在https://github.com/astra-vision/poda上找到。

Domain adaptation has been vastly investigated in computer vision but still requires access to target images at train time, which might be intractable in some uncommon conditions. In this paper, we propose the task of `Prompt-driven Zero-shot Domain Adaptation', where we adapt a model trained on a source domain using only a general description in natural language of the target domain, i.e., a prompt. First, we leverage a pretrained contrastive vision-language model (CLIP) to optimize affine transformations of source features, steering them towards the target text embedding while preserving their content and semantics. To achieve this, we propose Prompt-driven Instance Normalization (PIN). Second, we show that these prompt-driven augmentations can be used to perform zero-shot domain adaptation for semantic segmentation. Experiments demonstrate that our method significantly outperforms CLIP-based style transfer baselines on several datasets for the downstream task at hand, even surpassing one-shot unsupervised domain adaptation. A similar boost is observed on object detection and image classification. The code is available at https://github.com/astra-vision/PODA .

下载PDF全文

下载文献需遵守相关版权规定

论文标题