淡紫色的生成模型得分：理论和实践

论文标题

淡紫色的生成模型得分：理论和实践

MAUVE Scores for Generative Models: Theory and Practice

论文作者

Pillutla, Krishna, Liu, Lang, Thickstun, John, Welleck, Sean, Swayamdipta, Swabha, Zellers, Rowan, Oh, Sewoong, Choi, Yejin, Harchaoui, Zaid

论文摘要

生成的人工智能取得了长足的进步，与人类散文和非常逼真的图像没有区别。自动衡量生成的数据分布与目标分布的距离对于诊断现有模型和开发更好的数据是至关重要的。我们提出了Mauve，这是一对比较措施的家族，例如文本或图像的生成建模中遇到的分布。这些分数是差异前沿的统计摘要，它们在生成建模中捕获了两种类型的错误。我们探讨了三种从统计上估计这些分数的方法：向量量化，非参数估计和基于分类器的估计。我们为矢量量化方法提供统计界限。从经验上讲，我们发现所提出的分数与$ f $ diverences和统计估计方法配对，可以通过与人类判断和确定已知文本的已知属性相关联，可以量化人写的文本的分布与现代神经语言模型的分布之间的差距。我们在视觉领域中证明了淡紫色可以识别出与现有指标相比或更好的生成图像的已知属性。总之，我们提出了有效使用语言和图像方式的淡紫色的实用建议。

Generative artificial intelligence has made significant strides, producing text indistinguishable from human prose and remarkably photorealistic images. Automatically measuring how close the generated data distribution is to the target distribution is central to diagnosing existing models and developing better ones. We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images. These scores are statistical summaries of divergence frontiers capturing two types of errors in generative modeling. We explore three approaches to statistically estimate these scores: vector quantization, non-parametric estimation, and classifier-based estimation. We provide statistical bounds for the vector quantization approach. Empirically, we find that the proposed scores paired with a range of $f$-divergences and statistical estimation methods can quantify the gaps between the distributions of human-written text and those of modern neural language models by correlating with human judgments and identifying known properties of the generated texts. We demonstrate in the vision domain that MAUVE can identify known properties of generated images on par with or better than existing metrics. In conclusion, we present practical recommendations for using MAUVE effectively with language and image modalities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题