一个基于语义块的有效的粗到最新的刻面意识无监督的摘要框架

论文标题

一个基于语义块的有效的粗到最新的刻面意识无监督的摘要框架

An Efficient Coarse-to-Fine Facet-Aware Unsupervised Summarization Framework based on Semantic Blocks

论文作者

Liang, Xinnian, Li, Jing, Wu, Shuangzhi, Zeng, Jiali, Jiang, Yufan, Li, Mu, Li, Zhoujun

论文摘要

无监督的摘要方法通过纳入了预训练的语言模型的表示形式，从而取得了显着的结果。但是，当输入文档非常长的同时，现有方法无法考虑效率和有效性。为了解决这个问题，在本文中，我们提出了一个基于语义块的无监督长期文档摘要，提议有效的粗到最新方面的排名（C2F-FAR）框架。语义块是指描述相同方面的文档中的连续句子。具体而言，我们通过将一步排名方法转换为层次多范围两阶段排名来解决此问题。在粗级阶段，我们提出了一种新的段算法，将文档分为刻面感知的语义块，然后过滤不重要的块。在精细阶段，我们在每个块中选择显着句子，然后从选定的句子中提取最终摘要。我们在四个长文档摘要数据集上评估了我们的框架：Gov-Report，Billsum，Arxiv和PubMed。我们的C2F-FAR可以在Gov-Report和Billsum上实现新的无监督摘要结果。此外，我们的方法比以前的方法高4-28倍。

Unsupervised summarization methods have achieved remarkable results by incorporating representations from pre-trained language models. However, existing methods fail to consider efficiency and effectiveness at the same time when the input document is extremely long. To tackle this problem, in this paper, we proposed an efficient Coarse-to-Fine Facet-Aware Ranking (C2F-FAR) framework for unsupervised long document summarization, which is based on the semantic block. The semantic block refers to continuous sentences in the document that describe the same facet. Specifically, we address this problem by converting the one-step ranking method into the hierarchical multi-granularity two-stage ranking. In the coarse-level stage, we propose a new segment algorithm to split the document into facet-aware semantic blocks and then filter insignificant blocks. In the fine-level stage, we select salient sentences in each block and then extract the final summary from selected sentences. We evaluate our framework on four long document summarization datasets: Gov-Report, BillSum, arXiv, and PubMed. Our C2F-FAR can achieve new state-of-the-art unsupervised summarization results on Gov-Report and BillSum. In addition, our method speeds up 4-28 times more than previous methods.\footnote{\url{https://github.com/xnliang98/c2f-far}}

下载PDF全文

下载文献需遵守相关版权规定

论文标题