当前位置: X-MOL 学术Digit. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SAGAN: Skip attention generative adversarial networks for few-shot image generation
Digital Signal Processing ( IF 2.9 ) Pub Date : 2024-03-15 , DOI: 10.1016/j.dsp.2024.104466
Ali Aldhubri , Jianfeng Lu , Guanyiman Fu

The task of producing high-quality, realistic, and diverse images based on a few instances of newly emerging or long-tail categories is known as few-shot image generation. Despite prior works showing outstanding results, the quality and diversity of the outputs are still limited. In this paper, we tackle this problem by presenting a range of innovative fusion techniques based on attention mechanisms with generative adversarial networks. Our proposed framework introduces global skip attention that links matching residual blocks of symmetric encoder-decoder pairs to generate new instance objects. Additionally, we incorporate an alignment algorithm based on spatial transformer networks into our pipeline encoder to address feature misalignment. In the decoding phase of our attention-based decoder, we propose a novel attention mechanism within each fusion residual block, which leads to capturing long-range dependencies in feature maps. An attention reconstruction loss function has been proposed to balance adversarial training between the generator and discriminator, mitigate mode collapse, and guide the generator to focus on specific regions of interest within images. Finally, we apply a back summation to the decoding outputs, resulting in unified features through a weighted combination of similar characteristics. Extensive experiments conducted on five few-shot image datasets demonstrate the effectiveness of our proposed model. The source code of the proposed model can be found on GitHub .

中文翻译:

SAGAN:跳过注意力生成对抗网络来生成少量图像

基于新出现的或长尾类别的一些实例生成高质量、真实且多样化的图像的任务被称为少镜头图像生成。尽管之前的工作取得了出色的成果,但产出的质量和多样性仍然有限。在本文中,我们通过提出一系列基于注意力机制和生成对抗网络的创新融合技术来解决这个问题。我们提出的框架引入了全局跳过注意力,将对称编码器-解码器对的匹配残差块链接起来以生成新的实例对象。此外,我们将基于空间变换网络的对齐算法合并到我们的管道编码器中,以解决特征未对齐的问题。在基于注意力的解码器的解码阶段,我们在每个融合残差块中提出了一种新颖的注意力机制,这导致捕获特征图中的远程依赖性。提出了一种注意力重建损失函数来平衡生成器和鉴别器之间的对抗性训练,减轻模式崩溃,并引导生成器专注于图像中的特定感兴趣区域。最后,我们对解码输出应用反求和,通过相似特征的加权组合产生统一的特征。对五个少量图像数据集进行的广泛实验证明了我们提出的模型的有效性。所提出模型的源代码可以在 GitHub 上找到。
更新日期:2024-03-15
down
wechat
bug