论文标题
食谱2VEC:多模式食谱表示与图神经网络的学习
Recipe2Vec: Multi-modal Recipe Representation Learning with Graph Neural Networks
论文作者
论文摘要
学习有效的配方表示在食品研究中至关重要。与为基于图像的食谱检索或学习结构文本嵌入而开发的内容不同,多模式信息(即食谱图像,文本和关系数据)的组合效果较少。在本文中,我们将多模式配方表示的问题正式化,以将视觉,文本和关系信息整合到食谱嵌入中。特别是,我们首先介绍了一个大型RG,这是一个具有超过500万个节点的新食谱图数据,使其成为迄今为止最大的配方图。然后,我们提出了配方2VEC,这是一种基于图形神经网络的新型食谱嵌入模型,以捕获多模式信息。此外,我们引入了一种对抗性攻击策略,以确保稳定学习并提高绩效。最后,我们设计了节点分类和对抗性学习的联合目标功能,以优化模型。广泛的实验表明,食谱2VEC在两项经典食品研究任务(即美食类别分类和区域预测)上的最先进基线优于最先进的基线。数据集和代码可在https://github.com/meettyj/recipe2vec上找到。
Learning effective recipe representations is essential in food studies. Unlike what has been developed for image-based recipe retrieval or learning structural text embeddings, the combined effect of multi-modal information (i.e., recipe images, text, and relation data) receives less attention. In this paper, we formalize the problem of multi-modal recipe representation learning to integrate the visual, textual, and relational information into recipe embeddings. In particular, we first present Large-RG, a new recipe graph data with over half a million nodes, making it the largest recipe graph to date. We then propose Recipe2Vec, a novel graph neural network based recipe embedding model to capture multi-modal information. Additionally, we introduce an adversarial attack strategy to ensure stable learning and improve performance. Finally, we design a joint objective function of node classification and adversarial learning to optimize the model. Extensive experiments demonstrate that Recipe2Vec outperforms state-of-the-art baselines on two classic food study tasks, i.e., cuisine category classification and region prediction. Dataset and codes are available at https://github.com/meettyj/Recipe2Vec.