论文标题
角色感知模型改善视觉文本渲染
Character-Aware Models Improve Visual Text Rendering
论文作者
论文摘要
当前的图像生成模型难以可靠地产生良好的视觉文本。在本文中,我们研究了一个关键因素:流行的文本到图像模型缺乏字符级输入功能,因此很难将单词的视觉构成视为一系列字形。为了量化这种效果,我们进行了一系列实验,以比较字符感知与字符盲文文本编码器。在仅文本域中,我们发现角色吸引的模型在新颖的拼写任务(Wikispell)方面提供了巨大的收益。将我们的学习应用于视觉域,我们训练了一套图像生成模型,并表明角色吸引的变体在一系列新颖的文本渲染任务(我们的绘制文本基准)上的角色盲的表现优于其角色盲的对应物。我们的模型在视觉拼写方面设定了更高的最先进,尽管在较少的示例中进行了训练,但在稀有单词上,竞争对手的精确度获得了30多点的准确性。
Current image generation models struggle to reliably produce well-formed visual text. In this paper, we investigate a key contributing factor: popular text-to-image models lack character-level input features, making it much harder to predict a word's visual makeup as a series of glyphs. To quantify this effect, we conduct a series of experiments comparing character-aware vs. character-blind text encoders. In the text-only domain, we find that character-aware models provide large gains on a novel spelling task (WikiSpell). Applying our learnings to the visual domain, we train a suite of image generation models, and show that character-aware variants outperform their character-blind counterparts across a range of novel text rendering tasks (our DrawText benchmark). Our models set a much higher state-of-the-art on visual spelling, with 30+ point accuracy gains over competitors on rare words, despite training on far fewer examples.