论文标题
TLDR:科学文档的极端总结
TLDR: Extreme Summarization of Scientific Documents
论文作者
论文摘要
我们为科学论文介绍了TLDR Generation,这是一种极端总结的一种新形式。 TLDR生成涉及高源压缩,需要专家背景知识和对复杂领域特定语言的理解。为了促进对这项任务的研究,我们介绍了Scitldr,这是一个超过3.2k纸的新型多目标数据集。 Scitldr既包含作者写作和专家衍生的TLDR,又包含使用新颖的注释协议来收集后者的TLDR,该方案可产生高质量的摘要,同时最大程度地减少注释负担。我们提出了Catts,这是一种简单而有效的学习策略,用于生成将标题作为辅助培训信号的TLDR。在自动化指标和人类评估下,CATT在强大的基准方面有所改善。数据和代码可在https://github.com/allenai/scitldr上公开获取。
We introduce TLDR generation, a new form of extreme summarization, for scientific papers. TLDR generation involves high source compression and requires expert background knowledge and understanding of complex domain-specific language. To facilitate study on this task, we introduce SciTLDR, a new multi-target dataset of 5.4K TLDRs over 3.2K papers. SciTLDR contains both author-written and expert-derived TLDRs, where the latter are collected using a novel annotation protocol that produces high-quality summaries while minimizing annotation burden. We propose CATTS, a simple yet effective learning strategy for generating TLDRs that exploits titles as an auxiliary training signal. CATTS improves upon strong baselines under both automated metrics and human evaluations. Data and code are publicly available at https://github.com/allenai/scitldr.