论文标题

宾夕法尼亚韩国通用依赖树库(PKT-UD)的分析:在韩语中建立强大解析模型的手动修订

Analysis of the Penn Korean Universal Dependency Treebank (PKT-UD): Manual Revision to Build Robust Parsing Model in Korean

论文作者

Oh, Tae Hwan, Han, Ji Yoon, Choe, Hyonsu, Park, Seokwon, He, Han, Choi, Jinho D., Han, Na-Rae, Hwang, Jena D., Kim, Hansaem

论文摘要

在本文中,我们首先就有关Penn Korean Universal Treebank(PKT-UD)的重要问题开放,并通过手动修改整个语料库来解决这些问题,目的是生产更清洁的UD注释,这些注释更忠实于韩国语法。为了与其他UD语料库的兼容,我们遵循UDV2指南,并广泛修改言论的标签和依赖关系,以反映韩国人的形态学特征和灵活的单词顺序。 PKT-UD的原始版本和修订版的版本是使用Biaffine注意的基于变压器的解析模型实验的。在修订后的语料库中训练的解析模型比在先前的语料库中训练的模型相比,标记的附件得分的3.0%显着提高。我们的错误分析表明,此修订版允许解析模型更加牢固地学习关系,从而减少了以前模型犯下的几个关键错误。

In this paper, we first open on important issues regarding the Penn Korean Universal Treebank (PKT-UD) and address these issues by revising the entire corpus manually with the aim of producing cleaner UD annotations that are more faithful to Korean grammar. For compatibility to the rest of UD corpora, we follow the UDv2 guidelines, and extensively revise the part-of-speech tags and the dependency relations to reflect morphological features and flexible word-order aspects in Korean. The original and the revised versions of PKT-UD are experimented with transformer-based parsing models using biaffine attention. The parsing model trained on the revised corpus shows a significant improvement of 3.0% in labeled attachment score over the model trained on the previous corpus. Our error analysis demonstrates that this revision allows the parsing model to learn relations more robustly, reducing several critical errors that used to be made by the previous model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源