End-to-end optimized image compression with the frequency-oriented transform

Zhang, Yuefeng; Lin, Kai

doi:10.1007/s00138-023-01507-x

End-to-end optimized image compression with the frequency-oriented transform

Original Paper
Published: 07 February 2024

Volume 35, article number 27, (2024)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

210 Accesses
1 Citation
1 Altmetric
Explore all metrics

This article has been updated

Abstract

Image compression constitutes a significant challenge amid the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method that could preserve semantic fidelity besides signal-level precision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Enhanced Multi-frequency Learned Image Compression Method

Neural Multi-scale Image Compression

Optimizing Image Compression via Joint Learning with Denoising

Data availability

All data generated or analyzed during this study are included in this published article (and its supplementary information files).

Change history

27 February 2024
Typo in email of author Kai Lin corrected

Notes

References

Wallace, G.K.: The jpeg still picture compression standard. IEEE Trans. Consum. Electron. (TCE) 38(1) (1992)
Rabbani, M.: Jpeg 2000: image compression fundamentals, standards and practice. J. Electron. Imaging (JEI) 11(2), 286 (2002)
Article Google Scholar
Sullivan, G.J., Ohm, J.-R., Han, W.-J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. (TCSVT) 22(12), 1649–1668 (2012)
Article Google Scholar
Bross, B., Wang, Y., Ye, Y., Liu, S., Chen, J., Sullivan, G.J., Ohm, J.: Overview of the versatile video coding (VVC) standard and its applications. IEEE Trans. Circuits Sys. Video Technol. (TCSVT) 31(10), 3736–3764 (2021)
Article Google Scholar
Rippel, O., Bourdev, L.: Real-time adaptive image compression. In: International Conference on Machine Learning (ICML), pp. 2922–2930. PMLR (2017)
Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. In: International Conference on Learning Representations (ICLR), pp. 1–23 (2018)
Minnen, D., Ballé, J., Toderici, G.: Joint autoregressive and hierarchical priors for learned image compression. Neural Inf. Process. Syst. (NIPS) (2018)
Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with discretized gaussian mixture likelihoods and attention modules. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7936–7945 (2020)
Hu, Y., Yang, W., Ma, Z., Liu, J.: Learning end-to-end lossy image compression: a benchmark. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) (2021)
Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. In: International Conference on Learning Representations (ICLR), pp. 1–27 (2017)
Li, X., Jin, X., Yu, T., Pang, Y., Sun, S., Zhang, Z., Chen, Z.: Learning omni-frequency region-adaptive representations for real image super-resolution. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI), pp. 1975–1983 (2021)
Akbari, M., Liang, J., Han, J., Tu, C.: Generalized octave convolutions for learned multi-frequency image compression. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI) (2021)
Bovik, A. (ed.): Handbook of image and video processing (2005)
Antonini, M., Barlaud, M., Mathieu, P., Daubechies, I.: Image coding using wavelet transform. IEEE Trans. Image Process. (TIP) 1(2), 205–220 (1992)
Article Google Scholar
Murphy, M.S., Brooks, D.I., Cook, R.G.: Pigeons use high spatial frequencies when memorizing pictures. J. Exp. Psychol. Anim. Learn. Cognit. 41(3), 277 (2015)
Article Google Scholar
Nakanishi, K.M., Maeda, S.-i., Miyato, T., Okanohara, D.: Neural multi-scale image compression. In: Asian Conference on Computer Vision (ACCV), pp. 718–732 (2018). Springer
Company, E.K.: Kodak Lossless True Color Image Suite. http://r0k.us/graphics/kodak/ (1999)
George, T., Wenzhe, S., Radu, T., Lucas, T., Johannes, B., Eirikur, A., Nick, J., Fabian, M.: Workshop and Challenge on Learned Image Compression (CLIC2020) (2020). http://www.compression.cc
Zhang, Y., Lin, K., Jia, C., Ma, S.: Interpretable learned image compression: A frequency transform decomposition perspective. In: 2022 Data Compression Conference (DCC) (2022)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI), pp. 4278–4284 (2017)
Burt, P.J., Adelson, E.H.: The Laplacian pyramid as a compact image code. IEEE Trans. Commun. (TCOM) 31, 532–540 (1983)
Article Google Scholar
Adelson, E., Anderson, C., Bergen, J., Burt, P., Ogden, J.: Pyramid methods in image processing. RCA Eng. 29 (1983)
Ranjan, A., Black, M.J.: Optical flow estimation using a spatial pyramid network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2720–2729 (2017)
Li, Z., Shu, H., Zheng, C.: Multi-scale single image dehazing using Laplacian and gaussian pyramids. IEEE Trans. Image Process. (TIP) 30, 9270–9279 (2021)
Article Google Scholar
Watson, A.B.: Image compression using the discrete cosine transform. Math. J. 4(1), 81 (1994)
MathSciNet Google Scholar
Chen, Y., Fan, H., Xu, B., Yan, Z., Kalantidis, Y., Rohrbach, M., Yan, S., Feng, J.: Drop an octave: reducing spatial redundancy in convolutional neural networks with octave convolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3435–3444 (2019)
Markus, A.F., Kors, J.A., Rijnbeek, P.R.: The role of explainability in creating trustworthy artificial intelligence for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies. J. Biomed. Inf. (JBI), 103655 (2021)
Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., Zhong, C.: Interpretable machine learning: fundamental principles and 10 grand challenges. arXiv:2103.11251 (2021)
Toderici, G., O’Malley, S.M., Hwang, S.J., Vincent, D., Minnen, D., Baluja, S., Covell, M., Sukthankar, R.: Variable rate image compression with recurrent neural networks. In: International Conference on Learning Representations (ICLR) (2016)
Jia, C., Liu, Z., Wang, Y., Ma, S., Gao, W.: Layered image compression using scalable auto-encoder. In: 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 431–436 (2019). IEEE
Choi, Y., El-Khamy, M., Lee, J.: Variable rate deep image compression with a conditional autoencoder. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3146–3154 (2019)
Duda, J.: Asymmetric numeral systems: Entropy coding combining speed of huffman coding with compression rate of arithmetic coding. arXiv: Information Theory (2013)
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 603–612 (2019)
Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. Int. J. Comput. Vis. (IJCV) 127(8), 1106–1125 (2019)
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: International Conference for Learning Representations (ICLR) (2015)
Clark, A.: Python Imaging Library (PIL). https://pillow.readthedocs.io/en/5.1.x/index.html (2010)
Bellard: BPG Image Format. https://bellard.org/bpg/ (2014)
JVET: VVC Test Model (VTM). https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM (2018)
Lee, J., Cho, S., Beack, S.-K.: Context-adaptive entropy model for end-to-end optimized image compression. In: International Conference on Learning Representations (ICLR) (2019)
B’egaint, J., Racap’e, F., Feltman, S., Pushparaja, A.: Compressai: a pytorch library and evaluation platform for end-to-end compression research. arXiv:2011.03029 (2020)
Chen, T., Liu, H., Ma, Z., Shen, Q., Cao, X., Wang, Y.: End-to-end learnt image compression via non-local attention optimization and improved context modeling. IEEE Trans. Image Process. (TIP) 30, 3179–3191 (2021)
Article Google Scholar
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision (ECCV), pp. 740–755 (2014). Springer
Jocher, G.: yolov5: v5.0 - YOLOv5-P6 1280 models. https://github.com/ultralytics/yolov5 (2021)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K.P., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40, 834–848 (2018)
Article Google Scholar
Caesar, H., Uijlings, J.R.R., Ferrari, V.: Coco-stuff: Thing and stuff classes in context. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1209–1218 (2018)

Download references

Acknowledgements

The authors would like to thank the associate editor and anonymous reviewers for their constructive comments to improve the quality of this paper. The authors thank Prof. Siwei Ma (Peking University) for valuable discussion and support. The authors thank Dr. Chuanmin Jia (Peking University) for assistance in experiment setup and comments on the manuscript.

Author information

Authors and Affiliations

Beijing Institute of Computer Technology and Application, 51th Yongding Road, Haidian District, Beijing, 100039, China
Yuefeng Zhang
School of Computer Science, Peking University, 5th Yiheyuan Road, Haidian District, Beijing, 100871, China
Kai Lin

Authors

Yuefeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuefeng Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Lin, K. End-to-end optimized image compression with the frequency-oriented transform. Machine Vision and Applications 35, 27 (2024). https://doi.org/10.1007/s00138-023-01507-x

Download citation

Received: 05 October 2022
Revised: 03 July 2023
Accepted: 20 December 2023
Published: 07 February 2024
DOI: https://doi.org/10.1007/s00138-023-01507-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

End-to-end optimized image compression with the frequency-oriented transform

Abstract

Access this article

Similar content being viewed by others

An Enhanced Multi-frequency Learned Image Compression Method

Neural Multi-scale Image Compression

Optimizing Image Compression via Joint Learning with Denoising

Data availability

Change history

27 February 2024

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

End-to-end optimized image compression with the frequency-oriented transform

Abstract

Access this article

Similar content being viewed by others

An Enhanced Multi-frequency Learned Image Compression Method

Neural Multi-scale Image Compression

Optimizing Image Compression via Joint Learning with Denoising

Data availability

Change history

27 February 2024

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation