结构化，灵活和稳健：在分布外推理任务中基准和改进大型语言模型，以更像人类的行为

论文标题

结构化，灵活和稳健：在分布外推理任务中基准和改进大型语言模型，以更像人类的行为

Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks

论文作者

Collins, Katherine M., Wong, Catherine, Feng, Jiahai, Wei, Megan, Tenenbaum, Joshua B.

论文摘要

人类语言为我们的思想提供了强大的窗口 - 我们讲故事，提供解释并通过言语表达我们的信念和目标。大量证据还表明，语言在构建我们的学习方面起着发展的作用。在这里，我们问：仅在语言中学习统计模式就可以捕捉到多少类人类的思维？我们首先为比较人类和分销大语言模型（LLM）做出了新的挑战基准。我们的基准有两个解决问题的域（计划和解释生成），旨在需要对语言表达的新的，分布的问题进行概括。我们发现，在这个基准上，人类比LLM更强大。接下来，我们提出了一个混合分析模型，该模型可以通过结构化符号推理模块增强分布llms。我们发现，该模型显示出更强大的适应分布计划问题，这表明了混合AI模型对更类似人类的推理的承诺。

Human language offers a powerful window into our thoughts -- we tell stories, give explanations, and express our beliefs and goals through words. Abundant evidence also suggests that language plays a developmental role in structuring our learning. Here, we ask: how much of human-like thinking can be captured by learning statistical patterns in language alone? We first contribute a new challenge benchmark for comparing humans and distributional large language models (LLMs). Our benchmark contains two problem-solving domains (planning and explanation generation) and is designed to require generalization to new, out-of-distribution problems expressed in language. We find that humans are far more robust than LLMs on this benchmark. Next, we propose a hybrid Parse-and-Solve model, which augments distributional LLMs with a structured symbolic reasoning module. We find that this model shows more robust adaptation to out-of-distribution planning problems, demonstrating the promise of hybrid AI models for more human-like reasoning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题