论文标题
摘要源命题级别对齐:任务,数据集和受监管的基线
Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline
论文作者
论文摘要
参考摘要中的对齐句子与源文档中的对应物作为一个有用的辅助摘要任务,特别是用于生成培训数据以进行显着检测。尽管评估了效用,但对齐步骤主要采用了启发式无监督的方法,通常是基于胭脂的,并且从未独立优化或评估。在本文中,我们建议将摘要源对准作为一项明确的任务,同时引入两个主要的新颖性:(1)将其应用于更准确的命题跨度级别,(2)将其作为监督分类任务接近。为此,我们创建了一个新颖的培训数据集,以用于命题级别对齐,从可用的摘要评估数据自动得出。此外,我们众包开发和测试数据集,实现模型开发和适当的评估。利用这些数据,我们提出了一个有监督的命题一致性基线模型,显示了无监督方法的对齐质量的改善。
Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task, notably for generating training data for salience detection. Despite its assessed utility, the alignment step was mostly approached with heuristic unsupervised methods, typically ROUGE-based, and was never independently optimized or evaluated. In this paper, we propose establishing summary-source alignment as an explicit task, while introducing two major novelties: (1) applying it at the more accurate proposition span level, and (2) approaching it as a supervised classification task. To that end, we created a novel training dataset for proposition-level alignment, derived automatically from available summarization evaluation data. In addition, we crowdsourced dev and test datasets, enabling model development and proper evaluation. Utilizing these data, we present a supervised proposition alignment baseline model, showing improved alignment-quality over the unsupervised approach.