当前位置: X-MOL 学术arXiv.cs.MM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Referring Flexible Image Restoration
arXiv - CS - Multimedia Pub Date : 2024-04-16 , DOI: arxiv-2404.10342
Runwei Guan, Rongsheng Hu, Zhuhao Zhou, Tianlang Xue, Ka Lok Man, Jeremy Smith, Eng Gee Lim, Weiping Ding, Yutao Yue

In reality, images often exhibit multiple degradations, such as rain and fog at night (triple degradations). However, in many cases, individuals may not want to remove all degradations, for instance, a blurry lens revealing a beautiful snowy landscape (double degradations). In such scenarios, people may only desire to deblur. These situations and requirements shed light on a new challenge in image restoration, where a model must perceive and remove specific degradation types specified by human commands in images with multiple degradations. We term this task Referring Flexible Image Restoration (RFIR). To address this, we first construct a large-scale synthetic dataset called RFIR, comprising 153,423 samples with the degraded image, text prompt for specific degradation removal and restored image. RFIR consists of five basic degradation types: blur, rain, haze, low light and snow while six main sub-categories are included for varying degrees of degradation removal. To tackle the challenge, we propose a novel transformer-based multi-task model named TransRFIR, which simultaneously perceives degradation types in the degraded image and removes specific degradation upon text prompt. TransRFIR is based on two devised attention modules, Multi-Head Agent Self-Attention (MHASA) and Multi-Head Agent Cross Attention (MHACA), where MHASA and MHACA introduce the agent token and reach the linear complexity, achieving lower computation cost than vanilla self-attention and cross-attention and obtaining competitive performances. Our TransRFIR achieves state-of-the-art performances compared with other counterparts and is proven as an effective architecture for image restoration. We release our project at https://github.com/GuanRunwei/FIR-CP.

中文翻译:

参考灵活的图像修复

实际上,图像经常表现出多重退化,例如夜间下雨和起雾(三重退化)。然而,在许多情况下,个人可能不想消除所有降级,例如,模糊的镜头显示出美丽的雪景(双重降级)。在这种情况下,人们可能只想去模糊。这些情况和要求揭示了图像恢复中的新挑战,其中模型必须感知并消除人类命令在具有多种退化的图像中指定的特定退化类型。我们将此任务称为“灵活图像恢复”(RFIR)。为了解决这个问题,我们首先构建了一个名为 RFIR 的大规模合成数据集,其中包含 153,423 个样本,其中包含退化图像、特定退化去除的文本提示和恢复图像。 RFIR 包含五种基本退化类型:模糊、雨、雾、弱光和雪,同时还包含六个主要子类别,用于不同程度的退化消除。为了应对这一挑战,我们提出了一种名为 TransRFIR 的新型基于 Transformer 的多任务模型,它可以同时感知退化图像中的退化类型,并根据文本提示消除特定的退化。 TransRFIR 基于两个设计的注意力模块,多头代理自注意力(MHASA)和多头代理交叉注意力(MHACA),其中 MHASA 和 MHACA 引入代理令牌并达到线性复杂度,实现比普通模型更低的计算成本自注意力和交叉注意力并获得有竞争力的表现。与其他同类产品相比,我们的 TransRFIR 实现了最先进的性能,并被证明是一种有效的图像恢复架构。我们在 https://github.com/GuanRunwei/FIR-CP 发布了我们的项目。
更新日期:2024-04-17
down
wechat
bug