通过语义相似性等级的镜头，用于面向意义的NLG度量评估的动态，解释的清单

论文标题

通过语义相似性等级的镜头，用于面向意义的NLG度量评估的动态，解释的清单

A Dynamic, Interpreted CheckList for Meaning-oriented NLG Metric Evaluation -- through the Lens of Semantic Similarity Rating

论文作者

Zeidler, Laura, Opitz, Juri, Frank, Anette

论文摘要

评估生成文本的质量很困难，因为传统的NLG评估指标，更关注表面形式而不是含义，通常无法分配适当的分数。考虑到AMR的抽象性质，这对于AMR到文本评估尤其有问题。我们的工作旨在通过开发NLG指标的动态清单来支持NLG评估指标的开发和改进，该指标通过围绕意义上的语言现象进行组织来解释NLG指标的动态清单。每个测试实例都由一对带有AMR图的句子和人为生产的文本语义相似性或相关性得分组成。我们的清单促进了对指标的比较评估，并揭示了新颖和传统指标的优势和劣势。我们通过设计一个在AMR概念上计算词汇凝聚力图的新公制graco来证明清单的有用性。我们的分析表明，Graco提出了一个有趣的NLG指标，未来的研究，面向意义的NLG指标可以使用AMR从基于图的度量组件中获利。

Evaluating the quality of generated text is difficult, since traditional NLG evaluation metrics, focusing more on surface form than meaning, often fail to assign appropriate scores. This is especially problematic for AMR-to-text evaluation, given the abstract nature of AMR. Our work aims to support the development and improvement of NLG evaluation metrics that focus on meaning, by developing a dynamic CheckList for NLG metrics that is interpreted by being organized around meaning-relevant linguistic phenomena. Each test instance consists of a pair of sentences with their AMR graphs and a human-produced textual semantic similarity or relatedness score. Our CheckList facilitates comparative evaluation of metrics and reveals strengths and weaknesses of novel and traditional metrics. We demonstrate the usefulness of CheckList by designing a new metric GraCo that computes lexical cohesion graphs over AMR concepts. Our analysis suggests that GraCo presents an interesting NLG metric worth future investigation and that meaning-oriented NLG metrics can profit from graph-based metric components using AMR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题