SmartBrush：文本和形状引导对象用扩散模型介绍

论文标题

SmartBrush：文本和形状引导对象用扩散模型介绍

SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model

论文作者

Xie, Shaoan, Zhang, Zhifei, Lin, Zhe, Hinz, Tobias, Zhang, Kun

论文摘要

通用图像介绍旨在通过借用周围信息来完成损坏的图像，这几乎不会产生新颖的内容。相比之下，多模式内部插入提供了对成分内容的更灵活和有用的控件，例如，可以使用文本提示来描述具有更丰富属性的对象，并且可以使用掩码来限制涂有生物的对象的形状，而不是仅被视为缺失区域。我们提出了一种新的基于扩散的模型，名为SmartBrush，该模型使用文本和形状指标同时使用对象完成了缺失的区域。尽管以前的工作（例如Dalle-2和稳定的扩散）可以进行文本引导的inapinting，但它们不支持形状指导，并且倾向于修改周围生成对象的背景纹理。我们的模型将文本和形状指导与精确控制结合在一起。为了更好地保护背景，我们通过用对象掩盖预测来增强扩散的U-NET来提出一种新颖的培训和采样策略。最后，我们通过与文本到图像生成的共同培训介绍以利用更多的培训数据来介绍多任务培训策略。我们进行了广泛的实验，表明我们的模型在视觉质量，遮罩可控性和背景保护方面都优于所有基准。

Generic image inpainting aims to complete a corrupted image by borrowing surrounding information, which barely generates novel content. By contrast, multi-modal inpainting provides more flexible and useful controls on the inpainted content, \eg, a text prompt can be used to describe an object with richer attributes, and a mask can be used to constrain the shape of the inpainted object rather than being only considered as a missing area. We propose a new diffusion-based model named SmartBrush for completing a missing region with an object using both text and shape-guidance. While previous work such as DALLE-2 and Stable Diffusion can do text-guided inapinting they do not support shape guidance and tend to modify background texture surrounding the generated object. Our model incorporates both text and shape guidance with precision control. To preserve the background better, we propose a novel training and sampling strategy by augmenting the diffusion U-net with object-mask prediction. Lastly, we introduce a multi-task training strategy by jointly training inpainting with text-to-image generation to leverage more training data. We conduct extensive experiments showing that our model outperforms all baselines in terms of visual quality, mask controllability, and background preservation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题