论文标题

中国TED谈话的浅说话注释

Shallow Discourse Annotation for Chinese TED Talks

论文作者

Long, Wanqiu, Cai, Xinyi, Reid, James E. M., Webber, Bonnie, Xiong, Deyi

论文摘要

用语言相关属性注释的文本语料库是用于发展语言技术的重要资源。当前的工作为中文技术和中文 - 英语翻译提供了新的资源,以一系列TED演讲(一些最初用英语给出,有些用中文给出),这些会议以宾夕法尼亚州话语bank的风格注释,适用于英语中不存在的中文文本的属性。该资源目前在注释计划的口语独白而不是书面文本的话语级别的属性方面是独一无二的。一项通知者协议研究表明,注释方案能够获得高度可靠的结果。

Text corpora annotated with language-related properties are an important resource for the development of Language Technology. The current work contributes a new resource for Chinese Language Technology and for Chinese-English translation, in the form of a set of TED talks (some originally given in English, some in Chinese) that have been annotated with discourse relations in the style of the Penn Discourse TreeBank, adapted to properties of Chinese text that are not present in English. The resource is currently unique in annotating discourse-level properties of planned spoken monologues rather than of written text. An inter-annotator agreement study demonstrates that the annotation scheme is able to achieve highly reliable results.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源