Ctrleval：无监督的无参考度量，用于评估受控文本生成

论文标题

Ctrleval：无监督的无参考度量，用于评估受控文本生成

CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation

论文作者

Ke, Pei, Zhou, Hao, Lin, Yankai, Li, Peng, Zhou, Jie, Zhu, Xiaoyan, Huang, Minlie

论文摘要

现有的无参考指标具有评估受控文本生成模型的明显局限性。无监督的指标只能提供与人类判断较弱相关的任务不足的评估结果，而受监管的指标可能过于拟合特定于任务的数据，对其他数据集的概括能力差。在本文中，我们提出了一个名为Ctrleval的无监督的无参考度量，该指标通过将各个方面提出为多个文本填充任务来评估从不同方面的受控文本生成。在这些任务之外，指标还会从没有任何模型培训的情况下从预训练的语言模型中汇总出生成概率。实验结果表明，与其他基线相比，我们的度量与人类判断具有更高的相关性，同时可以更好地评估来自不同模型和不同品质的生成文本的概括。

Existing reference-free metrics have obvious limitations for evaluating controlled text generation models. Unsupervised metrics can only provide a task-agnostic evaluation result which correlates weakly with human judgments, whereas supervised ones may overfit task-specific data with poor generalization ability to other datasets. In this paper, we propose an unsupervised reference-free metric called CTRLEval, which evaluates controlled text generation from different aspects by formulating each aspect into multiple text infilling tasks. On top of these tasks, the metric assembles the generation probabilities from a pre-trained language model without any model training. Experimental results show that our metric has higher correlations with human judgments than other baselines, while obtaining better generalization of evaluating generated texts from different models and with different qualities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题