人类和机器中结构化任务分布的元学习

论文标题

人类和机器中结构化任务分布的元学习

Meta-Learning of Structured Task Distributions in Humans and Machines

论文作者

Kumar, Sreejan, Dasgupta, Ishita, Cohen, Jonathan D., Daw, Nathaniel D., Griffiths, Thomas L.

论文摘要

近年来，对模型进行了对任务系列（即任务分配）进行培训的元学习，已成为一种训练神经网络的方法，以执行以前假定需要结构化表示的任务，从而缩小了人类和机器之间的差距。但是，我们认为评估元学习仍然是一个挑战，并且可能会错过元学习是否实际使用任务中嵌入的结构。因此，这些元学习者可能仍然与人类学习者有显着不同。为了证明这种差异，我们首先定义了一项新的荟萃方面学习任务，其中使用构图语法生成结构化任务分布。然后，我们引入了一种新型方法，用于构建具有与该结构化任务分布相同的统计复杂性的“无效任务分布”，但没有用于生成结构化任务的明确基于规则的结构。我们培训标准的元学习代理，这是一个经过培训的无模型增强学习的经常性网络，并将其与两个任务分布中的人类绩效进行了比较。我们发现双重解离，其中人类在结构化任务分布中的表现更好，而代理在NULL任务分布中的表现更好 - 尽管统计复杂性可比。这项工作强调，多种策略可以实现合理的元测试绩效，并且仔细控制任务分布是了解哪些策略元学习者获得的策略，以及它们与人类的不同之处。

In recent years, meta-learning, in which a model is trained on a family of tasks (i.e. a task distribution), has emerged as an approach to training neural networks to perform tasks that were previously assumed to require structured representations, making strides toward closing the gap between humans and machines. However, we argue that evaluating meta-learning remains a challenge, and can miss whether meta-learning actually uses the structure embedded within the tasks. These meta-learners might therefore still be significantly different from humans learners. To demonstrate this difference, we first define a new meta-reinforcement learning task in which a structured task distribution is generated using a compositional grammar. We then introduce a novel approach to constructing a "null task distribution" with the same statistical complexity as this structured task distribution but without the explicit rule-based structure used to generate the structured task. We train a standard meta-learning agent, a recurrent network trained with model-free reinforcement learning, and compare it with human performance across the two task distributions. We find a double dissociation in which humans do better in the structured task distribution whereas agents do better in the null task distribution -- despite comparable statistical complexity. This work highlights that multiple strategies can achieve reasonable meta-test performance, and that careful construction of control task distributions is a valuable way to understand which strategies meta-learners acquire, and how they might differ from humans.

下载PDF全文

下载文献需遵守相关版权规定

论文标题