论文标题
场景图生成的高度关系学习网络
Hyper-relationship Learning Network for Scene Graph Generation
论文作者
论文摘要
从图像中生成信息性的场景图需要从各种图组件(即对象和关系)中集成和推理。但是,当前场景的生成(SGG)方法,包括无偏的SGG方法,仍然难以预测信息性关系,因为缺乏1)高级推断,例如关系和关系之间的传递推断和2)有效的机制,这些机制可以结合图形组件的所有相互作用。为了解决上述问题,我们为SGG设计了一个称为HLN的高度关系学习网络。具体而言,提出的HLN源于超图和两个图形注意网络(GAT)旨在推断关系:1)对象关联gat或gat探索对象与关系之间的相互作用,以及2)超级关系gat或hr-gat以整合超级差异的转换性关系,即在三个对象之间集成了三个对象之间的传递性关系。结果,HLN通过从对象相互作用,关系相互作用以及超相关性的传递推理中整合和推理来显着提高场景图生成的性能。我们在最受欢迎的SGG数据集(即视觉基因组数据集)上评估了HLN,实验结果证明了它比最近最新方法的优越性。例如,拟议的HLN将每个关系的召回率从11.3 \%提高到13.1 \%,并将每个图像的召回率从19.8 \%\%降低到34.9 \%。我们将在GitHub上发布源代码和预估计的模型。
Generating informative scene graphs from images requires integrating and reasoning from various graph components, i.e., objects and relationships. However, current scene graph generation (SGG) methods, including the unbiased SGG methods, still struggle to predict informative relationships due to the lack of 1) high-level inference such as transitive inference between relationships and 2) efficient mechanisms that can incorporate all interactions of graph components. To address the issues mentioned above, we devise a hyper-relationship learning network, termed HLN, for SGG. Specifically, the proposed HLN stems from hypergraphs and two graph attention networks (GATs) are designed to infer relationships: 1) the object-relationship GAT or OR-GAT to explore interactions between objects and relationships, and 2) the hyper-relationship GAT or HR-GAT to integrate transitive inference of hyper-relationships, i.e., the sequential relationships between three objects for transitive reasoning. As a result, HLN significantly improves the performance of scene graph generation by integrating and reasoning from object interactions, relationship interactions, and transitive inference of hyper-relationships. We evaluate HLN on the most popular SGG dataset, i.e., the Visual Genome dataset, and the experimental results demonstrate its great superiority over recent state-of-the-art methods. For example, the proposed HLN improves the recall per relationship from 11.3\% to 13.1\%, and maintains the recall per image from 19.8\% to 34.9\%. We will release the source code and pretrained models on GitHub.