论文标题
多融合中文WordNet(MCW):机器学习和手动校正的化合物
Multi-Fusion Chinese WordNet (MCW) : Compound of Machine Learning and Manual Correction
论文作者
论文摘要
普林斯顿Wordnet(PWN)是基于认知语言学的词典语音网络,可促进自然语言处理的发展。基于PWN,已经开发了五个中文文字来解决语法和语义的问题。其中包括:东北大学中文Wordnet(新),Sinica双语本体论Wordnet(Bow),东南大学中文Wordnet(SEW),台湾大学中文Wordnet(CWN),中国开放式WordNet(Cow)。通过使用它们,我们发现这些单词网络的精度和覆盖率较低,并且无法完全描绘PWN的语义网络。因此,我们决定制作一个称为多融合中文Wordnet(MCW)的新中文WordNet,以弥补这些缺点。关键的想法是在牛津双语词典和新华社双语词典的帮助下扩展缝制,然后对其进行纠正。更具体地说,我们在更正中使用了机器学习和手动调整。制定了两个标准以帮助我们的工作。我们对三个任务进行了实验,包括相关性计算,单词相似性和单词感官歧义,以比较引理准确性,同时还比较了覆盖范围。结果表明,MCW可以通过我们的方法从覆盖范围和准确性中受益。但是,它仍然有改进的余地,尤其是在引理中。将来,我们将继续提高MCW的准确性并扩大其中的概念。
Princeton WordNet (PWN) is a lexicon-semantic network based on cognitive linguistics, which promotes the development of natural language processing. Based on PWN, five Chinese wordnets have been developed to solve the problems of syntax and semantics. They include: Northeastern University Chinese WordNet (NEW), Sinica Bilingual Ontological WordNet (BOW), Southeast University Chinese WordNet (SEW), Taiwan University Chinese WordNet (CWN), Chinese Open WordNet (COW). By using them, we found that these word networks have low accuracy and coverage, and cannot completely portray the semantic network of PWN. So we decided to make a new Chinese wordnet called Multi-Fusion Chinese Wordnet (MCW) to make up those shortcomings. The key idea is to extend the SEW with the help of Oxford bilingual dictionary and Xinhua bilingual dictionary, and then correct it. More specifically, we used machine learning and manual adjustment in our corrections. Two standards were formulated to help our work. We conducted experiments on three tasks including relatedness calculation, word similarity and word sense disambiguation for the comparison of lemma's accuracy, at the same time, coverage also was compared. The results indicate that MCW can benefit from coverage and accuracy via our method. However, it still has room for improvement, especially with lemmas. In the future, we will continue to enhance the accuracy of MCW and expand the concepts in it.