论文标题
最小一是提示可以在大语言模型中实现复杂的推理
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
论文作者
论文摘要
经过深思熟虑的提示在各种自然语言推理任务上表现出了出色的表现。但是,它在任务上的表现往往比提示中显示的示例更难解决问题的任务差。为了克服这一易于限制的概括的挑战,我们提出了一种新颖的提示策略,最少提示。该策略的关键思想是将一个复杂的问题分解为一系列简单的子问题,然后按顺序解决。以前解决的子问题的答案可以促进解决每个子问题。我们对与符号操作,组成概括和数学推理有关的任务的实验结果表明,比提示中最不重要的提示能够概括到更困难的问题。一个值得注意的发现是,当GPT-3 Code-Davinci-002模型使用最少提示的情况下使用时,它可以在任何分裂(包括长度拆分)中以仅使用14个示例的精度来求解组成的概括基准扫描,而仅使用14个示例,与只有16%的精确度,与链条的快速提示相比。这是特别值得注意的,因为文献中专门解决扫描的文献中的神经符号模型对包含15,000多个例子的整个训练组进行了培训。我们已经在附录中包含了所有任务的提示。
Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence. Solving each subproblem is facilitated by the answers to previously solved subproblems. Our experimental results on tasks related to symbolic manipulation, compositional generalization, and math reasoning reveal that least-to-most prompting is capable of generalizing to more difficult problems than those seen in the prompts. A notable finding is that when the GPT-3 code-davinci-002 model is used with least-to-most prompting, it can solve the compositional generalization benchmark SCAN in any split (including length split) with an accuracy of at least 99% using just 14 exemplars, compared to only 16% accuracy with chain-of-thought prompting. This is particularly noteworthy because neural-symbolic models in the literature that specialize in solving SCAN are trained on the entire training set containing over 15,000 examples. We have included prompts for all the tasks in the Appendix.