Bringing Textual Prompt to AI-Generated Image Quality Assessment,arXiv - CS - Multimedia

当前位置： X-MOL 学术 › arXiv.cs.MM › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Bringing Textual Prompt to AI-Generated Image Quality Assessment
arXiv - CS - Multimedia Pub Date : 2024-03-27 , DOI: arxiv-2403.18714
Bowen Qu, Haohui Li, Wei Gao

AI-Generated Images (AGIs) have inherent multimodal nature. Unlike traditional image quality assessment (IQA) on natural scenarios, AGIs quality assessment (AGIQA) takes the correspondence of image and its textual prompt into consideration. This is coupled in the ground truth score, which confuses the unimodal IQA methods. To solve this problem, we introduce IP-IQA (AGIs Quality Assessment via Image and Prompt), a multimodal framework for AGIQA via corresponding image and prompt incorporation. Specifically, we propose a novel incremental pretraining task named Image2Prompt for better understanding of AGIs and their corresponding textual prompts. An effective and efficient image-prompt fusion module, along with a novel special [QA] token, are also applied. Both are plug-and-play and beneficial for the cooperation of image and its corresponding prompt. Experiments demonstrate that our IP-IQA achieves the state-of-the-art on AGIQA-1k and AGIQA-3k datasets. Code will be available.

中文翻译：

将文本提示引入人工智能生成的图像质量评估

人工智能生成图像（AGI）具有固有的多模态性质。与传统的自然场景图像质量评估（IQA）不同，AGI质量评估（AGIQA）考虑了图像与其文本提示的对应关系。这与地面真实分数相结合，这使单峰 IQA 方法感到困惑。为了解决这个问题，我们引入了 IP-IQA（通过图像和提示进行 AGI 质量评估），这是一个通过相应图像和提示合并的 AGIQA 多模态框架。具体来说，我们提出了一种名为 Image2Prompt 的新型增量预训练任务，以更好地理解 AGI 及其相应的文本提示。还应用了有效且高效的图像提示融合模块以及新颖的特殊 [QA] 令牌。两者都是即插即用的，有利于图像及其相应提示的配合。实验表明，我们的 IP-IQA 在 AGIQA-1k 和 AGIQA-3k 数据集上达到了最先进的水平。代码将可用。

更新日期：2024-03-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>