论文标题

语法误差校正的全面调查

A Comprehensive Survey of Grammar Error Correction

论文作者

Wang, Yu, Wang, Yuelin, Liu, Jie, Liu, Zhuo

论文摘要

语法误差校正(GEC)是自然语言处理技术的重要应用方面。在过去的十年中,为了增加机器学习和深度学习的普及,在GEC中取得了重大进展,尤其是在2010年代末,当时人类级别的GEC系统可获得。但是,没有以前的工作着重于整个进度的概括。我们介绍了GEC中的首次调查,以全面回顾该领域的文献。我们首先介绍了五个公共数据集,数据注释模式,两个重要的共享任务和四个标准评估指标。更重要的是,我们讨论了四种基本方法,包括基于统计机器翻译的方法,基于神经机器翻译的方法,基于分类的方法和基于语言模型的方法,六种常用的GEC系统应用性能提升技术以及两种数据增强方法。由于通常将GEC视为机器翻译的姐妹任务,因此许多GEC系统基于神经机器翻译(NMT)方法,其中应用了神经序列到序列模型。同样,某些性能提升技术是根据机器翻译进行了调整的,并成功地与GEC系统结合使用,以增强最终性能。此外,我们基于他们的实验结果,分别对基本方法,促进性能技术和集成的GEC系统进行了分析,以获得更清晰的模式和结论。最后,我们讨论了未来GEC研究的五个预期方向。

Grammar error correction (GEC) is an important application aspect of natural language processing techniques. The past decade has witnessed significant progress achieved in GEC for the sake of increasing popularity of machine learning and deep learning, especially in late 2010s when near human-level GEC systems are available. However, there is no prior work focusing on the whole recapitulation of the progress. We present the first survey in GEC for a comprehensive retrospect of the literature in this area. We first give the introduction of five public datasets, data annotation schema, two important shared tasks and four standard evaluation metrics. More importantly, we discuss four kinds of basic approaches, including statistical machine translation based approach, neural machine translation based approach, classification based approach and language model based approach, six commonly applied performance boosting techniques for GEC systems and two data augmentation methods. Since GEC is typically viewed as a sister task of machine translation, many GEC systems are based on neural machine translation (NMT) approaches, where the neural sequence-to-sequence model is applied. Similarly, some performance boosting techniques are adapted from machine translation and are successfully combined with GEC systems for enhancement on the final performance. Furthermore, we conduct an analysis in level of basic approaches, performance boosting techniques and integrated GEC systems based on their experiment results respectively for more clear patterns and conclusions. Finally, we discuss five prospective directions for future GEC researches.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源