当前位置: X-MOL 学术J. Real-Time Image Proc. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enhancing low-light images via skip cross-attention fusion and multi-scale lightweight transformer
Journal of Real-Time Image Processing ( IF 3 ) Pub Date : 2024-02-27 , DOI: 10.1007/s11554-024-01424-w
Jianming Zhang , Zi Xing , Mingshuang Wu , Yan Gui , Bin Zheng

Images captured in environments with poor lighting conditions often suffer from insufficient brightness, significant noise, and color distortion, which is highly detrimental to subsequent high-level vision tasks. Low-light image enhancement requires effective feature extraction and fusion, and the advantages of transformer and convolution in image processing are complementary. Therefore, it is an intentional exploration to combine them in image enhancement. In this paper, we propose a novel UNet-like method for enhancing low-light images. Transformer blocks are stacked to form the encoder, and convolutional blocks are utilized in the decoder. First, considering that the Transformer can effectively capture global information and convolution can obtain local information, this paper improves the lightweight Transformer by integrating multi-scale depth-wise convolution into the feedforward network to extract comprehensive features. Then, we design a Skip Cross-Attention Module to replace the traditional skip connection, which combines feature maps from different stages of the encoder and decoder. To achieve better feature fusion, this module employs two masks: the Top-k Mask and the adaptive V Channel Mask based on maximum entropy. The Top-k Mask will filter out unfavorable features by preserving the top k scores of attention map, and the V Channel Mask utilizes the corrected V channel of the image as an illumination guide for enhancement. Finally, extensive experiments on seven datasets demonstrate that our method achieves good performance in both subjective and objective evaluations. Specifically, the runtime on LOL-v2-real dataset demonstrates that our method is close to achieving real-time performance.



中文翻译:

通过跳过交叉注意融合和多尺度轻量级转换器增强低光图像

在光照条件较差的环境中捕获的图像往往会出现亮度不足、噪点明显、色彩失真等问题,这对后续的高级视觉任务非常不利。低光图像增强需要有效的特征提取和融合,而Transformer和卷积在图像处理中的优势是互补的。因此,将它们结合起来用于图像增强是一种有意的探索。在本文中,我们提出了一种类似于 UNet 的新方法来增强低光图像。变换器块堆叠起来形成编码器,并且在解码器中使用卷积块。首先,考虑到Transformer可以有效捕获全局信息,而卷积可以获取局部信息,本文通过将多尺度深度卷积集成到前馈网络中以提取综合特征来改进轻量级Transformer。然后,我们设计了一个 Skip Cross-Attention Module 来代替传统的跳跃连接,它结合了编码器和解码器不同阶段的特征图。为了实现更好的特征融合,该模块采用了两种掩模:Top- k掩模和基于最大熵的自适应V通道掩模。Top- k Mask会通过保留注意力图的前k个分数来过滤掉不利的特征,而V Channel Mask则利用图像的校正后的V通道作为增强的照明指南。最后,对七个数据集的广泛实验表明,我们的方法在主观和客观评估方面均取得了良好的性能。具体来说,LOL-v2-real 数据集上的运行时表明我们的方法接近实现实时性能。

更新日期:2024-02-28
down
wechat
bug