使用机器和用户生成的自然语言描述来改善几乎没有弹头的图像分类

论文标题

使用机器和用户生成的自然语言描述来改善几乎没有弹头的图像分类

Improving Few-Shot Image Classification Using Machine- and User-Generated Natural Language Descriptions

论文作者

Nishida, Kosuke, Nishida, Kyosuke, Nishioka, Shuichi

论文摘要

人类可以从语言描述中获取新颖的视觉概念知识，因此我们使用少量图像分类任务来研究机器学习模型是否可以具有此功能。我们提出的模型Lide（从图像和描述中学习）具有文本解码器来生成描述和文本编码器，以获取机器或用户生成的描述的文本表示。我们证实，与机器生成的描述的LIDE优于基线模型。此外，使用高质量的用户生成的描述进一步提高了性能。生成的描述可以看作是模型预测的解释，我们观察到这种解释与预测结果一致。我们还研究了为什么语言描述通过比较图像表示形式和特征空间中的文本表示来改善了几张图像分类性能。

Humans can obtain the knowledge of novel visual concepts from language descriptions, and we thus use the few-shot image classification task to investigate whether a machine learning model can have this capability. Our proposed model, LIDE (Learning from Image and DEscription), has a text decoder to generate the descriptions and a text encoder to obtain the text representations of machine- or user-generated descriptions. We confirmed that LIDE with machine-generated descriptions outperformed baseline models. Moreover, the performance was improved further with high-quality user-generated descriptions. The generated descriptions can be viewed as the explanations of the model's predictions, and we observed that such explanations were consistent with prediction results. We also investigated why the language description improved the few-shot image classification performance by comparing the image representations and the text representations in the feature spaces.

下载PDF全文

下载文献需遵守相关版权规定

论文标题