论文标题

神经系统粘合剂

Neural Systematic Binder

论文作者

Singh, Gautam, Kim, Yeongbin, Ahn, Sungjin

论文摘要

据信,高级认知的关键被认为是系统地操纵和构成知识作品的能力。尽管文本自然提供了类似令牌的结构化知识表示形式,但如何以诸如场景图像之类的非结构化方式获取它们是难以捉摸的。在本文中,我们提出了一种神经机制,称为神经系统粘合剂或Sysbinder,用于构建一种新型的结构化表示形式,称为块状表示。在块插槽表示中,通过组成一组称为块的独立因子表示来构建称为插槽的以对象为中心的表示,以促进系统的概括。 Sysbinder通过交替应用两个不同的结合原理以无监督的方式获得了这种结构:在整个场景中为空间模块化的空间结合以及对象内部因子模块化的因子结合。 Sysbinder是一个简单,确定性和通用层,可以在任何任意神经网络和任何模态上作为倒入模块应用。在实验中,我们发现Sysbinder在插槽内提供了比传统以对象为中心的方法的明显更好的因子分离,包括首次在视觉上复杂的场景图像(例如CLEVR-TEX)中。此外,我们通过解码看不见的因子组合来证明受控场景生成中的因子级系统。

The key to high-level cognition is believed to be the ability to systematically manipulate and compose knowledge pieces. While token-like structured knowledge representations are naturally provided in text, it is elusive how to obtain them for unstructured modalities such as scene images. In this paper, we propose a neural mechanism called Neural Systematic Binder or SysBinder for constructing a novel structured representation called Block-Slot Representation. In Block-Slot Representation, object-centric representations known as slots are constructed by composing a set of independent factor representations called blocks, to facilitate systematic generalization. SysBinder obtains this structure in an unsupervised way by alternatingly applying two different binding principles: spatial binding for spatial modularity across the full scene and factor binding for factor modularity within an object. SysBinder is a simple, deterministic, and general-purpose layer that can be applied as a drop-in module in any arbitrary neural network and on any modality. In experiments, we find that SysBinder provides significantly better factor disentanglement within the slots than the conventional object-centric methods, including, for the first time, in visually complex scene images such as CLEVR-Tex. Furthermore, we demonstrate factor-level systematicity in controlled scene generation by decoding unseen factor combinations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源