BMX：增强自然语言生成指标

论文标题

BMX：增强自然语言生成指标

BMX: Boosting Natural Language Generation Metrics with Explainability

论文作者

Leiter, Christoph, Nguyen, Hoa, Eger, Steffen

论文摘要

最先进的自然语言生成评估指标基于黑盒语言模型。因此，最近的作品将它们的解释性考虑在人类更好地理解性和更好的度量分析（包括失败案例）的目标上。相比之下，我们提出的方法BMX：以明确的解释性解释来提高指标的性能，从而提高自然语言产生指标。特别是，我们认为特征的重要性解释是单词级别的分数，我们通过力量均值将其转换为细分级别的分数。然后，我们将此部分级别的分数与原始度量标准相结合以获得更好的度量。我们的测试显示了MT和摘要数据集的多个指标的改进。尽管机器翻译的改进很小，但它们对于汇总而言很强。值得注意的是，带有石灰解释器和预选参数的BMX在Summeval的系统级别评估上，Spearman相关性的平均提高为0.087点。

State-of-the-art natural language generation evaluation metrics are based on black-box language models. Hence, recent works consider their explainability with the goals of better understandability for humans and better metric analysis, including failure cases. In contrast, our proposed method BMX: Boosting Natural Language Generation Metrics with explainability explicitly leverages explanations to boost the metrics' performance. In particular, we perceive feature importance explanations as word-level scores, which we convert, via power means, into a segment-level score. We then combine this segment-level score with the original metric to obtain a better metric. Our tests show improvements for multiple metrics across MT and summarization datasets. While improvements in machine translation are small, they are strong for summarization. Notably, BMX with the LIME explainer and preselected parameters achieves an average improvement of 0.087 points in Spearman correlation on the system-level evaluation of SummEval.

下载PDF全文

下载文献需遵守相关版权规定

论文标题