使用可学习的内存微调图像变压器

论文标题

使用可学习的内存微调图像变压器

Fine-tuning Image Transformers using Learnable Memory

论文作者

Sandler, Mark, Zhmoginov, Andrey, Vladymyrov, Max, Jackson, Andrew

论文摘要

在本文中，我们提出了具有可学习记忆令牌的增强视觉变压器模型。我们的方法使模型可以使用几个参数适应新任务，同时可选地保留其在先前学习的任务上的功能。在每一层中，我们介绍了一组可学习的嵌入向量，这些向量提供了对特定数据集有用的上下文信息。我们称这些“内存令牌”。我们表明，与传统的纯粹的微调相比，每层仅使用几个这样的令牌的模型可以显着提高准确性，并且仅略低于明显昂贵的全面微调。然后，我们提出了一种注意力掩盖方法，可以通过计算重复使用来扩展到新的下游任务。在此设置中，除了有效的参数外，模型还可以以小额增量成本执行旧任务和新任务，作为单个推理的一部分。

In this paper we propose augmenting Vision Transformer models with learnable memory tokens. Our approach allows the model to adapt to new tasks, using few parameters, while optionally preserving its capabilities on previously learned tasks. At each layer we introduce a set of learnable embedding vectors that provide contextual information useful for specific datasets. We call these "memory tokens". We show that augmenting a model with just a handful of such tokens per layer significantly improves accuracy when compared to conventional head-only fine-tuning, and performs only slightly below the significantly more expensive full fine-tuning. We then propose an attention-masking approach that enables extension to new downstream tasks, with a computation reuse. In this setup in addition to being parameters efficient, models can execute both old and new tasks as a part of single inference at a small incremental cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题