有您的文字并使用它！语义忠诚的端到端神经数据对文本生成

论文标题

有您的文字并使用它！语义忠诚的端到端神经数据对文本生成

Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity

论文作者

Harkous, Hamza, Groves, Isabel, Saffari, Amir

论文摘要

端到端神经数据对文本（D2T）的生成最近已成为基于管道的架构的替代方案。但是，它在推广到新领域并生成语义上一致的文本方面面临着挑战。在这项工作中，我们介绍了DataTuner，这是一种神经，端到端的数据到文本生成系统，对数据表示和目标域的假设最少。我们采用了两阶段的生成级方法，将微调的语言模型与语义保真分类器相结合。我们的每个组件都是端到端学习的，而无需针对数据集特定的启发式方法，实体避免或后处理。我们表明，DataTuner在四个主要的D2T数据集（LDC2017T10，WebNLG，VigGo和Cleaned E2E）上实现了自动指标的最新结果，并通过接近或超过人为写的参考文本的人类注释者评估了流畅的态度。我们进一步证明，与传统的基于启发式的措施相比，DataTuner中基于模型的语义忠诚得分手是一种更好的评估工具。我们生成的文本具有比所有四个数据集中的最新技术的语义保真度要好得多

End-to-end neural data-to-text (D2T) generation has recently emerged as an alternative to pipeline-based architectures. However, it has faced challenges in generalizing to new domains and generating semantically consistent text. In this work, we present DataTuner, a neural, end-to-end data-to-text generation system that makes minimal assumptions about the data representation and the target domain. We take a two-stage generation-reranking approach, combining a fine-tuned language model with a semantic fidelity classifier. Each of our components is learnt end-to-end without the need for dataset-specific heuristics, entity delexicalization, or post-processing. We show that DataTuner achieves state of the art results on the automated metrics across four major D2T datasets (LDC2017T10, WebNLG, ViGGO, and Cleaned E2E), with a fluency assessed by human annotators nearing or exceeding the human-written reference texts. We further demonstrate that the model-based semantic fidelity scorer in DataTuner is a better assessment tool compared to traditional, heuristic-based measures. Our generated text has a significantly better semantic fidelity than the state of the art across all four datasets

下载PDF全文

下载文献需遵守相关版权规定

论文标题