令人困惑的机器：从小数据中学习的挑战

论文标题

令人困惑的机器：从小数据中学习的挑战

PuzzLing Machines: A Challenge on Learning From Small Data

论文作者

Şahin, Gözde Gül, Kementchedjhieva, Yova, Rust, Phillip, Gurevych, Iryna

论文摘要

深层神经模型在为各种ML和NLP基准测试的大型数据集中的记忆表面模式中反复证明。但是，他们难以实现类似人类的思维，因为他们缺乏迭代推理的技巧。为了从新的角度揭示这个问题，我们引入了从小型数据，令人困惑的机器中学习的挑战，该机器包括来自语言奥林匹克运动会的Rosetta Stone Pupzles。这些难题经过精心设计，仅包含推断出看不见的表达式所需的最少平行文本。解决它们不需要外部信息（例如，知识库，视觉信号）或语言专业知识，而需要元语言意识和演绎技能。我们的挑战包含大约100个难题，涵盖了81种语言的各种语言现象。我们表明，正如预期的那样，简单的统计算法和最先进的深神经模型在这一挑战上都不足。我们希望该基准在https://ukplab.github.io/puzzling-machines/上可用，激发了进一步的努力，为NLP的新范式提供了进一步的努力 - 一种基于人类的推理和理解。

Deep neural models have repeatedly proved excellent at memorizing surface patterns from large datasets for various ML and NLP benchmarks. They struggle to achieve human-like thinking, however, because they lack the skill of iterative reasoning upon knowledge. To expose this problem in a new light, we introduce a challenge on learning from small data, PuzzLing Machines, which consists of Rosetta Stone puzzles from Linguistic Olympiads for high school students. These puzzles are carefully designed to contain only the minimal amount of parallel text necessary to deduce the form of unseen expressions. Solving them does not require external information (e.g., knowledge bases, visual signals) or linguistic expertise, but meta-linguistic awareness and deductive skills. Our challenge contains around 100 puzzles covering a wide range of linguistic phenomena from 81 languages. We show that both simple statistical algorithms and state-of-the-art deep neural models perform inadequately on this challenge, as expected. We hope that this benchmark, available at https://ukplab.github.io/PuzzLing-Machines/, inspires further efforts towards a new paradigm in NLP---one that is grounded in human-like reasoning and understanding.

下载PDF全文

下载文献需遵守相关版权规定

论文标题