当前位置: X-MOL 学术arXiv.cs.GR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Lazy Diffusion Transformer for Interactive Image Editing
arXiv - CS - Graphics Pub Date : 2024-04-18 , DOI: arxiv-2404.12382
Yotam Nitzan, Zongze Wu, Richard Zhang, Eli Shechtman, Daniel Cohen-Or, Taesung Park, Michaël Gharbi

We introduce a novel diffusion transformer, LazyDiffusion, that generates partial image updates efficiently. Our approach targets interactive image editing applications in which, starting from a blank canvas or an image, a user specifies a sequence of localized image modifications using binary masks and text prompts. Our generator operates in two phases. First, a context encoder processes the current canvas and user mask to produce a compact global context tailored to the region to generate. Second, conditioned on this context, a diffusion-based transformer decoder synthesizes the masked pixels in a "lazy" fashion, i.e., it only generates the masked region. This contrasts with previous works that either regenerate the full canvas, wasting time and computation, or confine processing to a tight rectangular crop around the mask, ignoring the global image context altogether. Our decoder's runtime scales with the mask size, which is typically small, while our encoder introduces negligible overhead. We demonstrate that our approach is competitive with state-of-the-art inpainting methods in terms of quality and fidelity while providing a 10x speedup for typical user interactions, where the editing mask represents 10% of the image.

中文翻译:

用于交互式图像编辑的惰性扩散变压器

我们引入了一种新颖的扩散变压器 LazyDiffusion,它可以有效地生成部分图像更新。我们的方法针对交互式图像编辑应用程序,在该应用程序中,用户从空白画布或图像开始,使用二进制掩码和文本提示指定一系列本地化图像修改。我们的发电机分两个阶段运行。首先,上下文编码器处理当前画布和用户掩码,以生成适合要生成的区域的紧凑全局上下文。其次,以此上下文为条件,基于扩散的变换器解码器以“惰性”方式合成被遮蔽的像素,即,它仅生成被遮蔽的区域。这与以前的作品形成鲜明对比,以前的作品要么重新生成整个画布,浪费时间和计算,要么将处理限制在掩模周围的紧密矩形作物上,完全忽略全局图像上下文。我们的解码器的运行时间随着掩码大小而变化,掩码大小通常很小,而我们的编码器引入的开销可以忽略不计。我们证明,我们的方法在质量和保真度方面与最先进的修复方法具有竞争力,同时为典型的用户交互提供 10 倍的加速,其中编辑蒙版代表图像的 10%。
更新日期:2024-04-19
down
wechat
bug