论文标题
基于模板的抽象微博观点摘要
Template-based Abstractive Microblog Opinion Summarisation
论文作者
论文摘要
我们介绍了微博观点摘要(MOS)的任务,并共享3100个金标准意见摘要的数据集,以促进该领域的研究。该数据集包含跨越2年期的推文的摘要,并且涵盖了比任何其他公共Twitter摘要数据集更多的主题。摘要本质上是抽象的,并且是由熟练的记者创建的,这些记者是在将事实信息(主要故事)与作者观点分开的模板之后总结新闻文章的。我们的方法不同于以前在社交媒体中生成金标准摘要的工作,这些摘要通常涉及选择代表性帖子,从而有利于提取性摘要模型。为了展示数据集的实用性和挑战,我们基准了一系列抽象性和提取性的最先进的摘要模型,并实现良好的性能,前者的表现要优于后者。我们还表明,进行微调对于提高性能和研究使用不同样本量的好处是必要的。
We introduce the task of microblog opinion summarisation (MOS) and share a dataset of 3100 gold-standard opinion summaries to facilitate research in this domain. The dataset contains summaries of tweets spanning a 2-year period and covers more topics than any other public Twitter summarisation dataset. Summaries are abstractive in nature and have been created by journalists skilled in summarising news articles following a template separating factual information (main story) from author opinions. Our method differs from previous work on generating gold-standard summaries from social media, which usually involves selecting representative posts and thus favours extractive summarisation models. To showcase the dataset's utility and challenges, we benchmark a range of abstractive and extractive state-of-the-art summarisation models and achieve good performance, with the former outperforming the latter. We also show that fine-tuning is necessary to improve performance and investigate the benefits of using different sample sizes.