论文标题
键形生成的预训练的语言模型:一项彻底的实证研究
Pre-trained Language Models for Keyphrase Generation: A Thorough Empirical Study
论文作者
论文摘要
不依赖前训练的神经模型在使用大量注释的数据集的键形生成任务中表现出色。同时,新方法已将预培训的语言模型(PLM)纳入了其数据效率。但是,缺乏对两种方法如何比较以及不同的设计选择如何影响基于PLM的模型的性能的系统研究。为了填补这一知识差距,并促进了PLM在键形提取和键形生成中的更明智的使用,我们提出了一项深入的经验研究。将键形提取作为序列标记和键形生成作为序列到序列生成,我们在三个域中进行了广泛的实验。在显示PLM具有具有竞争力的高资源性能和最先进的低资源性能之后,我们研究了重要的设计选择,包括内域PLM,具有不同预训练的目标的PLM,使用具有参数预算的PLM,以及当前键形键盘的不同配方。进一步的结果表明,(1)内域BERT样PLM可用于构建强大且具有数据效率的键形生成模型; (2)具有固定的参数预算,优先考虑模型深度超过宽度,并且在编码器中分配更多层会导致更好的编码器模型; (3)引入四个域中PLM,我们在新闻领域和科学领域的最先进表现中取得了竞争性表现。
Neural models that do not rely on pre-training have excelled in the keyphrase generation task with large annotated datasets. Meanwhile, new approaches have incorporated pre-trained language models (PLMs) for their data efficiency. However, there lacks a systematic study of how the two types of approaches compare and how different design choices can affect the performance of PLM-based models. To fill in this knowledge gap and facilitate a more informed use of PLMs for keyphrase extraction and keyphrase generation, we present an in-depth empirical study. Formulating keyphrase extraction as sequence labeling and keyphrase generation as sequence-to-sequence generation, we perform extensive experiments in three domains. After showing that PLMs have competitive high-resource performance and state-of-the-art low-resource performance, we investigate important design choices including in-domain PLMs, PLMs with different pre-training objectives, using PLMs with a parameter budget, and different formulations for present keyphrases. Further results show that (1) in-domain BERT-like PLMs can be used to build strong and data-efficient keyphrase generation models; (2) with a fixed parameter budget, prioritizing model depth over width and allocating more layers in the encoder leads to better encoder-decoder models; and (3) introducing four in-domain PLMs, we achieve a competitive performance in the news domain and the state-of-the-art performance in the scientific domain.