CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing,ACM Transactions on Graphics

当前位置： X-MOL 学术 › ACM Trans. Graph. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing
ACM Transactions on Graphics ( IF 6.2 ) Pub Date : 2023-07-19 , DOI: https://dl.acm.org/doi/10.1145/3610287
Ahmet Canberk Baykal, Abdul Basit Anees, Duygu Ceylan, Erkut Erdem, Aykut Erdem, Deniz Yuret

Researchers have recently begun exploring the use of StyleGAN-based models for real image editing. One particularly interesting application is using natural language descriptions to guide the editing process. Existing approaches for editing images using language either resort to instance-level latent code optimization or map predefined text prompts to some editing directions in the latent space. However, these approaches have inherent limitations. The former is not very efficient, while the latter often struggles to effectively handle multi-attribute changes. To address these weaknesses, we present CLIPInverter, a new text-driven image editing approach that is able to efficiently and reliably perform multi-attribute changes. The core of our method is the use of novel, lightweight text-conditioned adapter layers integrated into pretrained GAN-inversion networks. We demonstrate that by conditioning the initial inversion step on the CLIP embedding of the target description, we are able to obtain more successful edit directions. Additionally, we use a CLIP-guided refinement step to make corrections in the resulting residual latent codes, which further improves the alignment with the text prompt. Our method outperforms competing approaches in terms of manipulation accuracy and photo-realism on various domains including human faces, cats, and birds, as shown by our qualitative and quantitative results.

中文翻译：

CLIP 引导的 StyleGAN 反演，用于文本驱动的真实图像编辑

研究人员最近开始探索使用基于 StyleGAN 的模型进行真实图像编辑。一种特别有趣的应用是使用自然语言描述来指导编辑过程。使用语言编辑图像的现有方法要么求助于实例级潜在代码优化，要么将预定义的文本提示映射到潜在空间中的某些编辑方向。然而，这些方法有其固有的局限性。前者效率不高，而后者往往难以有效处理多属性变化。为了解决这些弱点，我们推出了 CLIPInverter，这是一种新的文本驱动图像编辑方法，能够高效可靠地执行多属性更改。我们方法的核心是使用新颖的、轻量级文本调节适配器层集成到预训练的 GAN 反转网络中。我们证明，通过调整目标描述的 CLIP 嵌入的初始反转步骤，我们能够获得更成功的编辑方向。此外，我们使用 CLIP 引导的细化步骤对生成的残留潜在代码进行更正，这进一步改善了与文本提示的对齐。正如我们的定性和定量结果所示，我们的方法在人脸、猫和鸟类等各个领域的操作准确性和照片真实感方面优于竞争方法。我们使用 CLIP 引导的细化步骤对生成的残留潜在代码进行更正，这进一步改善了与文本提示的对齐。正如我们的定性和定量结果所示，我们的方法在人脸、猫和鸟类等各个领域的操作准确性和照片真实感方面优于竞争方法。我们使用 CLIP 引导的细化步骤对生成的残留潜在代码进行更正，这进一步改善了与文本提示的对齐。正如我们的定性和定量结果所示，我们的方法在人脸、猫和鸟类等各个领域的操作准确性和照片真实感方面优于竞争方法。

更新日期：2023-07-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>