角色感知模型改善视觉文本渲染

论文标题

角色感知模型改善视觉文本渲染

Character-Aware Models Improve Visual Text Rendering

论文作者

Liu, Rosanne, Garrette, Dan, Saharia, Chitwan, Chan, William, Roberts, Adam, Narang, Sharan, Blok, Irina, Mical, RJ, Norouzi, Mohammad, Constant, Noah

论文摘要

当前的图像生成模型难以可靠地产生良好的视觉文本。在本文中，我们研究了一个关键因素：流行的文本到图像模型缺乏字符级输入功能，因此很难将单词的视觉构成视为一系列字形。为了量化这种效果，我们进行了一系列实验，以比较字符感知与字符盲文文本编码器。在仅文本域中，我们发现角色吸引的模型在新颖的拼写任务（Wikispell）方面提供了巨大的收益。将我们的学习应用于视觉域，我们训练了一套图像生成模型，并表明角色吸引的变体在一系列新颖的文本渲染任务（我们的绘制文本基准）上的角色盲的表现优于其角色盲的对应物。我们的模型在视觉拼写方面设定了更高的最先进，尽管在较少的示例中进行了训练，但在稀有单词上，竞争对手的精确度获得了30多点的准确性。

Current image generation models struggle to reliably produce well-formed visual text. In this paper, we investigate a key contributing factor: popular text-to-image models lack character-level input features, making it much harder to predict a word's visual makeup as a series of glyphs. To quantify this effect, we conduct a series of experiments comparing character-aware vs. character-blind text encoders. In the text-only domain, we find that character-aware models provide large gains on a novel spelling task (WikiSpell). Applying our learnings to the visual domain, we train a suite of image generation models, and show that character-aware variants outperform their character-blind counterparts across a range of novel text rendering tasks (our DrawText benchmark). Our models set a much higher state-of-the-art on visual spelling, with 30+ point accuracy gains over competitors on rare words, despite training on far fewer examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题