有条件自然语言生成的分销意识指标

论文标题

有条件自然语言生成的分销意识指标

Distribution Aware Metrics for Conditional Natural Language Generation

论文作者

Chan, David M, Ni, Yiming, Ross, David A, Vijayanarasimhan, Sudheendra, Myers, Austin, Canny, John

论文摘要

传统的自动化指标用于评估有条件的自然语言生成，使用单个生成的文本和最匹配的金色标准地面真相文本之间的成对比较。当有多个地面真相时，分数将使用参考中的平均或最大操作进行汇总。尽管这种方法在地面真相数据中的多样性（即有条件文本的分布的分散）可以归因于噪声，例如自动语音识别中的噪声，但在地面真相中的多样性代表模型的信号的情况下，它不允许进行强有力的评估。在这项工作中，我们认为现有的指标不适用于诸如视觉描述或摘要之类的域，而地面真理在语义上是多元化的，并且这些字幕中的多样性捕获了有关上下文的有用的其他信息。我们提出了一种新的范式，用于对条件语言生成模型的多键入评估，以及一个新的指标家族，该指标家族使用每种少量样本集比较参考和模型生成的字幕集的分布。我们通过视觉描述中的案例研究证明了方法的实用性：在其中我们表明，现有模型优化了单描述质量而不是多样性的质量，并获得了对采样方法和温度影响如何描述质量和多样性的一些见解。

Traditional automated metrics for evaluating conditional natural language generation use pairwise comparisons between a single generated text and the best-matching gold-standard ground truth text. When multiple ground truths are available, scores are aggregated using an average or max operation across references. While this approach works well when diversity in the ground truth data (i.e. dispersion of the distribution of conditional texts) can be ascribed to noise, such as in automated speech recognition, it does not allow for robust evaluation in the case where diversity in the ground truths represents signal for the model. In this work we argue that existing metrics are not appropriate for domains such as visual description or summarization where ground truths are semantically diverse, and where the diversity in those captions captures useful additional information about the context. We propose a novel paradigm for multi-candidate evaluation of conditional language generation models, and a new family of metrics that compare the distributions of reference and model-generated caption sets using small sample sets of each. We demonstrate the utility of our approach with a case study in visual description: where we show that existing models optimize for single-description quality over diversity, and gain some insights into how sampling methods and temperature impact description quality and diversity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题