论文标题
通过事实接地重新审视数据之间的挑战
Revisiting Challenges in Data-to-Text Generation with Fact Grounding
论文作者
论文摘要
数据之间的生成模型通过参考正确的输入源来确保数据保真度面临挑战。为了激发该领域的研究,Wiseman等。 (2017年)引入了Rotowire语料库,以从框和线得分表中生成NBA游戏摘要。但是,在这个方向上进行了有限的尝试,并且仍然存在挑战。我们在语料库中观察到一个突出的瓶颈,其中只有大约60%的摘要内容可以接地到BoxScore记录。这种信息缺乏倾向于误导条件语言模型,以产生无条件的随机事实,从而导致事实幻觉。在这项工作中,我们恢复了信息平衡并修改了此任务,以关注事实基础的数据对文本生成。我们介绍了一个纯化且较大的数据集,即Rotowire-FG(事实接地),并提供了2017 - 19年度的50%数据,并丰富了输入表,希望能够吸引更多的研究重点朝向这一方向。此外,我们通过将新形式的重建形式作为提高生成质量的辅助任务来实现对最新模型的改进数据保真度。
Data-to-text generation models face challenges in ensuring data fidelity by referring to the correct input source. To inspire studies in this area, Wiseman et al. (2017) introduced the RotoWire corpus on generating NBA game summaries from the box- and line-score tables. However, limited attempts have been made in this direction and the challenges remain. We observe a prominent bottleneck in the corpus where only about 60% of the summary contents can be grounded to the boxscore records. Such information deficiency tends to misguide a conditioned language model to produce unconditioned random facts and thus leads to factual hallucinations. In this work, we restore the information balance and revamp this task to focus on fact-grounded data-to-text generation. We introduce a purified and larger-scale dataset, RotoWire-FG (Fact-Grounding), with 50% more data from the year 2017-19 and enriched input tables, hoping to attract more research focuses in this direction. Moreover, we achieve improved data fidelity over the state-of-the-art models by integrating a new form of table reconstruction as an auxiliary task to boost the generation quality.