论文标题
结构化,灵活和稳健:在分布外推理任务中基准和改进大型语言模型,以更像人类的行为
Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks
论文作者
论文摘要
人类语言为我们的思想提供了强大的窗口 - 我们讲故事,提供解释并通过言语表达我们的信念和目标。大量证据还表明,语言在构建我们的学习方面起着发展的作用。在这里,我们问:仅在语言中学习统计模式就可以捕捉到多少类人类的思维?我们首先为比较人类和分销大语言模型(LLM)做出了新的挑战基准。我们的基准有两个解决问题的域(计划和解释生成),旨在需要对语言表达的新的,分布的问题进行概括。我们发现,在这个基准上,人类比LLM更强大。接下来,我们提出了一个混合分析模型,该模型可以通过结构化符号推理模块增强分布llms。我们发现,该模型显示出更强大的适应分布计划问题,这表明了混合AI模型对更类似人类的推理的承诺。
Human language offers a powerful window into our thoughts -- we tell stories, give explanations, and express our beliefs and goals through words. Abundant evidence also suggests that language plays a developmental role in structuring our learning. Here, we ask: how much of human-like thinking can be captured by learning statistical patterns in language alone? We first contribute a new challenge benchmark for comparing humans and distributional large language models (LLMs). Our benchmark contains two problem-solving domains (planning and explanation generation) and is designed to require generalization to new, out-of-distribution problems expressed in language. We find that humans are far more robust than LLMs on this benchmark. Next, we propose a hybrid Parse-and-Solve model, which augments distributional LLMs with a structured symbolic reasoning module. We find that this model shows more robust adaptation to out-of-distribution planning problems, demonstrating the promise of hybrid AI models for more human-like reasoning.