论文标题
这不是机器可以学习的,而是我们无法教的
It's Not What Machines Can Learn, It's What We Cannot Teach
论文作者
论文摘要
深度神经网络可以学会解决任何任务,特别是复杂性的问题吗?这个问题引起了很多兴趣,最近的作品解决了计算上的艰巨任务,例如旅行推销员问题和令人满意。在这项工作中,我们对这个问题提供了不同的看法。考虑到$ \ textIt {np} \ neq \ textit {conp} $的常见假设,我们证明了$ \ textIt {np} $的任何多项式时间示例生成器 - 实际上,从更容易的子问题中出发。我们从经验上探讨了一个案例研究,结合性查询遏制,并展示了共同的数据生成技术如何产生有偏见的数据集,使从业人员过度估计模型的准确性。我们的结果表明,需要对目标分布进行密集均匀采样的训练的机器学习方法不能用于解决计算上的硬问题,原因是难以生成足够大且无偏见的训练集。
Can deep neural networks learn to solve any task, and in particular problems of high complexity? This question attracts a lot of interest, with recent works tackling computationally hard tasks such as the traveling salesman problem and satisfiability. In this work we offer a different perspective on this question. Given the common assumption that $\textit{NP} \neq \textit{coNP}$ we prove that any polynomial-time sample generator for an $\textit{NP}$-hard problem samples, in fact, from an easier sub-problem. We empirically explore a case study, Conjunctive Query Containment, and show how common data generation techniques generate biased datasets that lead practitioners to over-estimate model accuracy. Our results suggest that machine learning approaches that require training on a dense uniform sampling from the target distribution cannot be used to solve computationally hard problems, the reason being the difficulty of generating sufficiently large and unbiased training sets.