论文标题
了解释义的指标
Understanding Metrics for Paraphrasing
论文作者
论文摘要
释义是一个困难的问题。这不仅是因为文本生成能力的局限性,而且还因为缺乏适当的定义,即有资格作为释义和相应的指标来衡量它的良好程度。评估释义质量的指标是一个正在进行的研究问题。从其他任务中借来的大多数现有指标都不会捕获良好的释义的完整本质,并且通常在边界案例中失败。在这项工作中,我们提出了一种新颖的度量$ rouge_p $,以沿着充分性,新颖性和流利度的范围来衡量释义的质量。我们还提供了经验证据,以表明当前的自然语言产生指标不足以衡量良好释义的这些期望的特性。我们研究了从指标镜头进行微调和生成的释义模型,以更深入地了解生成和评估良好释义所需的内容。
Paraphrase generation is a difficult problem. This is not only because of the limitations in text generation capabilities but also due that to the lack of a proper definition of what qualifies as a paraphrase and corresponding metrics to measure how good it is. Metrics for evaluation of paraphrasing quality is an on going research problem. Most of the existing metrics in use having been borrowed from other tasks do not capture the complete essence of a good paraphrase, and often fail at borderline-cases. In this work, we propose a novel metric $ROUGE_P$ to measure the quality of paraphrases along the dimensions of adequacy, novelty and fluency. We also provide empirical evidence to show that the current natural language generation metrics are insufficient to measure these desired properties of a good paraphrase. We look at paraphrase model fine-tuning and generation from the lens of metrics to gain a deeper understanding of what it takes to generate and evaluate a good paraphrase.