论文标题
在干草堆中寻找针头:对神经机器翻译中幻觉的全面研究
Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation
论文作者
论文摘要
尽管神经机器翻译(NMT)中幻觉的问题受到了一些关注,但对这种高度病理现象的研究缺乏坚实的基础。以前的工作在几种方面受到限制:它通常诉诸于放大问题的人工环境,它无视某些(常见的)幻觉类型,并且无法验证检测启发式方法的充分性。在本文中,我们为研究NMT幻觉的研究设定了基础。首先,我们在自然环境中工作,即没有人造噪声的内域数据,既不在训练中也没有推理。接下来,我们注释一个超过3.4K句子的数据集,指示不同类型的关键错误和幻觉。然后,我们转向以前使用的检测方法和两种重新访问方法,并建议使用基于玻璃盒的不确定性检测器。总体而言,我们表明,对于预防性设置,(i)先前使用的方法在很大程度上是不足的,(ii)序列对数概率效果最好,并且与基于参考的方法相同。最后,我们提出了脱足素剂,这是一种减轻测试时间的简单方法,可大大降低幻觉速度。为了简化未来的研究,我们发布了用于WMT18德语英语数据的注释数据集以及模型,培训数据和代码。
Although the problem of hallucinations in neural machine translation (NMT) has received some attention, research on this highly pathological phenomenon lacks solid ground. Previous work has been limited in several ways: it often resorts to artificial settings where the problem is amplified, it disregards some (common) types of hallucinations, and it does not validate adequacy of detection heuristics. In this paper, we set foundations for the study of NMT hallucinations. First, we work in a natural setting, i.e., in-domain data without artificial noise neither in training nor in inference. Next, we annotate a dataset of over 3.4k sentences indicating different kinds of critical errors and hallucinations. Then, we turn to detection methods and both revisit methods used previously and propose using glass-box uncertainty-based detectors. Overall, we show that for preventive settings, (i) previously used methods are largely inadequate, (ii) sequence log-probability works best and performs on par with reference-based methods. Finally, we propose DeHallucinator, a simple method for alleviating hallucinations at test time that significantly reduces the hallucinatory rate. To ease future research, we release our annotated dataset for WMT18 German-English data, along with the model, training data, and code.