论文标题
发现时空图神经网络的动态显着区域
Discovering Dynamic Salient Regions for Spatio-Temporal Graph Neural Networks
论文作者
论文摘要
图形神经网络非常适合捕获时空域中各个实体之间的潜在相互作用(例如,视频)。但是,当没有明确的结构可用时,尚不明显将哪些原子元素表示为节点。当前的作品通常使用预训练的对象检测器或固定的预定义区域来提取图节点。为了改善这一点,我们提出的模型学习了动态附加到良好的明显区域的节点,这些区域与高级任务相关,而无需使用任何对象级别的监督。构建这些本地化的自适应节点使我们的模型归纳偏见对以对象为中心表示,我们表明它发现了与视频中对象相关的区域。在对两个具有挑战性的数据集的广泛消融研究和实验中,我们显示出与以前的图形神经网络模型进行视频分类的卓越性能。
Graph Neural Networks are perfectly suited to capture latent interactions between various entities in the spatio-temporal domain (e.g. videos). However, when an explicit structure is not available, it is not obvious what atomic elements should be represented as nodes. Current works generally use pre-trained object detectors or fixed, predefined regions to extract graph nodes. Improving upon this, our proposed model learns nodes that dynamically attach to well-delimited salient regions, which are relevant for a higher-level task, without using any object-level supervision. Constructing these localized, adaptive nodes gives our model inductive bias towards object-centric representations and we show that it discovers regions that are well correlated with objects in the video. In extensive ablation studies and experiments on two challenging datasets, we show superior performance to previous graph neural networks models for video classification.