论文标题

一组评估语言翻译中人机平价的建议

A Set of Recommendations for Assessing Human-Machine Parity in Language Translation

论文作者

Läubli, Samuel, Castilho, Sheila, Neubig, Graham, Sennrich, Rico, Shen, Qinlan, Toral, Antonio

论文摘要

在过去的几年中,机器翻译的质量已明显提高,以至于在许多实证研究中发现它与专业人类翻译没有区别。我们重新评估了Hassan等人对英语新闻翻译的2018年调查,这表明人类机器平价的发现归功于评估设计中的弱点 - 目前被认为是该领域的最佳实践。我们表明,专业的人类翻译包含的错误明显较少,人类评估中的质量取决于评估者的选择,语言背景的可用性以及参考翻译的创建。我们的结果要求重新审视当前的最佳实践,以评估一般和人机平价强大的机器翻译系统,我们根据经验发现提供了一套建议。

The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations. We reassess Hassan et al.'s 2018 investigation into Chinese to English news translation, showing that the finding of human-machine parity was owed to weaknesses in the evaluation design - which is currently considered best practice in the field. We show that the professional human translations contained significantly fewer errors, and that perceived quality in human evaluation depends on the choice of raters, the availability of linguistic context, and the creation of reference translations. Our results call for revisiting current best practices to assess strong machine translation systems in general and human-machine parity in particular, for which we offer a set of recommendations based on our empirical findings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源