多模式感官处理中学习不变物理关系的框架

论文标题

多模式感官处理中学习不变物理关系的框架

A Framework for Learning Invariant Physical Relations in Multimodal Sensory Processing

论文作者

Xiaorui, Du, Erdem, Yavuzhan, Schweizer, Immanuel, Axenie, Cristian

论文摘要

感知学习使人类能够识别并代表各种转变的刺激，并建立对自我和物理世界的一致表示。这样的表示保留了多个感知的感官提示之间不变的身体关系。这项工作是试图在工程系统中利用这些原则的尝试。我们设计了一种新型的神经网络体系结构，能够以无监督的方式学习多个感官提示之间的关系。该系统将计算原理（例如竞争，合作和相关性）结合在神经合理的计算基板中。它实现了通过平行和分布式的处理体系结构，其中从时间序列数据中提取了多个感觉数量之间的关系。在低维感觉数据中学习任意非线性关系时，我们描述了核心系统功能。在这里，最初的好处来自于这样一个事实，即可以以相对简单的方式对这种网络进行设计，而无需先前有关传感器及其交互的信息。此外，由于减轻对乏味的建模和参数化的需求，该网络收敛到对任何任意高维多感觉设置的一致描述。我们通过现实世界中的学习问题来证明这一点，从标准的RGB摄像头框架中，该网络可以了解物理量之间的关系，例如光强度，空间梯度和光流，描述了视觉场景。总体而言，这种框架的好处在于能够在噪声和缺失传感器输入下稳定的体系结构中学习非线性成对关系的能力。

Perceptual learning enables humans to recognize and represent stimuli invariant to various transformations and build a consistent representation of the self and physical world. Such representations preserve the invariant physical relations among the multiple perceived sensory cues. This work is an attempt to exploit these principles in an engineered system. We design a novel neural network architecture capable of learning, in an unsupervised manner, relations among multiple sensory cues. The system combines computational principles, such as competition, cooperation, and correlation, in a neurally plausible computational substrate. It achieves that through a parallel and distributed processing architecture in which the relations among the multiple sensory quantities are extracted from time-sequenced data. We describe the core system functionality when learning arbitrary non-linear relations in low-dimensional sensory data. Here, an initial benefit rises from the fact that such a network can be engineered in a relatively straightforward way without prior information about the sensors and their interactions. Moreover, alleviating the need for tedious modelling and parametrization, the network converges to a consistent description of any arbitrary high-dimensional multisensory setup. We demonstrate this through a real-world learning problem, where, from standard RGB camera frames, the network learns the relations between physical quantities such as light intensity, spatial gradient, and optical flow, describing a visual scene. Overall, the benefits of such a framework lie in the capability to learn non-linear pairwise relations among sensory streams in an architecture that is stable under noise and missing sensor input.

下载PDF全文

下载文献需遵守相关版权规定

论文标题