基于语言模型调节和局部建模的抽象文本摘要

论文标题

基于语言模型调节和局部建模的抽象文本摘要

Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling

论文作者

Aksenov, Dmitrii, Moreno-Schneider, Julián, Bourgonje, Peter, Schwarzenberg, Robert, Hennig, Leonhard, Rehm, Georg

论文摘要

我们探索有关使用预训练的语言模型的知识在多大程度上对抽象摘要的任务有益。为此，我们试验了BERT语言模型上基于变压器的神经模型的编码器和解码器的试验。此外，我们提出了一种新的Bert Winding方法，该方法可以使文本的块处理时间长于Bert窗口的大小。我们还探讨了局部建模，即对本地上下文的明确限制如何影响变压器的汇总能力。这是通过将二维卷积自我注意力引入编码器的第一层来完成的。将我们的模型的结果与CNN/Daily Mail数据集上的基线和最先进的模型进行了比较。我们还在Swisstext数据集上训练模型，以证明德语的可用性。这两种模型的表现都优于两个数据集的Rouge分数的基线，并在手动定性分析中表现出了优越性。

We explore to what extent knowledge about the pre-trained language model that is used is beneficial for the task of abstractive summarization. To this end, we experiment with conditioning the encoder and decoder of a Transformer-based neural model on the BERT language model. In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size. We also explore how locality modelling, i.e., the explicit restriction of calculations to the local context, can affect the summarization ability of the Transformer. This is done by introducing 2-dimensional convolutional self-attention into the first layers of the encoder. The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset. We additionally train our model on the SwissText dataset to demonstrate usability on German. Both models outperform the baseline in ROUGE scores on two datasets and show its superiority in a manual qualitative analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题