通过使用句法特征改善基于RNN的关节层键形提取

论文标题

通过使用句法特征改善基于RNN的关节层键形提取

Improving Joint Layer RNN based Keyphrase Extraction by Using Syntactical Features

论文作者

Mahfuzh, Miftahul, Soleman, Sidik, Purwarianti, Ayu

论文摘要

键形提取作为从文本中识别重要词或短语的任务，是一个关键过程，在分析社交媒体平台的文本时识别主要主题。在我们的研究中，我们专注于从Twitter摘录的印度尼西亚语言写的文本。与原始的关节复发性神经网络（JRNN）不同，其中一个关键字序列的输出并仅使用单词嵌入，我们在这里建议通过句法特征的其他信息（即语音类型，命名式类型和依赖性结构）修改JRNN的输入层以提取多个关键字的序列。由于JRNN通常需要大量数据作为培训示例，并且创建这些示例很昂贵，因此我们使用了数据增强方法来增加培训示例的数量。我们的实验表明，我们的方法的表现优于基线方法。我们的方法在准确性上实现了.9597，而在F1中实现了.7691。

Keyphrase extraction as a task to identify important words or phrases from a text, is a crucial process to identify main topics when analyzing texts from a social media platform. In our study, we focus on text written in Indonesia language taken from Twitter. Different from the original joint layer recurrent neural network (JRNN) with output of one sequence of keywords and using only word embedding, here we propose to modify the input layer of JRNN to extract more than one sequence of keywords by additional information of syntactical features, namely part of speech, named entity types, and dependency structures. Since JRNN in general requires a large amount of data as the training examples and creating those examples is expensive, we used a data augmentation method to increase the number of training examples. Our experiment had shown that our method outperformed the baseline methods. Our method achieved .9597 in accuracy and .7691 in F1.

下载PDF全文

下载文献需遵守相关版权规定

论文标题