Skip to main content
Log in

End-to-end optimized image compression with the frequency-oriented transform

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

This article has been updated

Abstract

Image compression constitutes a significant challenge amid the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method that could preserve semantic fidelity besides signal-level precision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

All data generated or analyzed during this study are included in this published article (and its supplementary information files).

Change history

  • 27 February 2024

    Typo in email of author Kai Lin corrected

Notes

  1. https://developer.nvidia.com/maxine.

  2. https://segmentfault.com/a/1190000040968923/en.

References

  1. Wallace, G.K.: The jpeg still picture compression standard. IEEE Trans. Consum. Electron. (TCE) 38(1) (1992)

  2. Rabbani, M.: Jpeg 2000: image compression fundamentals, standards and practice. J. Electron. Imaging (JEI) 11(2), 286 (2002)

    Article  Google Scholar 

  3. Sullivan, G.J., Ohm, J.-R., Han, W.-J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. (TCSVT) 22(12), 1649–1668 (2012)

    Article  Google Scholar 

  4. Bross, B., Wang, Y., Ye, Y., Liu, S., Chen, J., Sullivan, G.J., Ohm, J.: Overview of the versatile video coding (VVC) standard and its applications. IEEE Trans. Circuits Sys. Video Technol. (TCSVT) 31(10), 3736–3764 (2021)

    Article  Google Scholar 

  5. Rippel, O., Bourdev, L.: Real-time adaptive image compression. In: International Conference on Machine Learning (ICML), pp. 2922–2930. PMLR (2017)

  6. Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. In: International Conference on Learning Representations (ICLR), pp. 1–23 (2018)

  7. Minnen, D., Ballé, J., Toderici, G.: Joint autoregressive and hierarchical priors for learned image compression. Neural Inf. Process. Syst. (NIPS) (2018)

  8. Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with discretized gaussian mixture likelihoods and attention modules. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7936–7945 (2020)

  9. Hu, Y., Yang, W., Ma, Z., Liu, J.: Learning end-to-end lossy image compression: a benchmark. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) (2021)

  10. Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. In: International Conference on Learning Representations (ICLR), pp. 1–27 (2017)

  11. Li, X., Jin, X., Yu, T., Pang, Y., Sun, S., Zhang, Z., Chen, Z.: Learning omni-frequency region-adaptive representations for real image super-resolution. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI), pp. 1975–1983 (2021)

  12. Akbari, M., Liang, J., Han, J., Tu, C.: Generalized octave convolutions for learned multi-frequency image compression. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI) (2021)

  13. Bovik, A. (ed.): Handbook of image and video processing (2005)

  14. Antonini, M., Barlaud, M., Mathieu, P., Daubechies, I.: Image coding using wavelet transform. IEEE Trans. Image Process. (TIP) 1(2), 205–220 (1992)

    Article  Google Scholar 

  15. Murphy, M.S., Brooks, D.I., Cook, R.G.: Pigeons use high spatial frequencies when memorizing pictures. J. Exp. Psychol. Anim. Learn. Cognit. 41(3), 277 (2015)

    Article  Google Scholar 

  16. Nakanishi, K.M., Maeda, S.-i., Miyato, T., Okanohara, D.: Neural multi-scale image compression. In: Asian Conference on Computer Vision (ACCV), pp. 718–732 (2018). Springer

  17. Company, E.K.: Kodak Lossless True Color Image Suite. http://r0k.us/graphics/kodak/ (1999)

  18. George, T., Wenzhe, S., Radu, T., Lucas, T., Johannes, B., Eirikur, A., Nick, J., Fabian, M.: Workshop and Challenge on Learned Image Compression (CLIC2020) (2020). http://www.compression.cc

  19. Zhang, Y., Lin, K., Jia, C., Ma, S.: Interpretable learned image compression: A frequency transform decomposition perspective. In: 2022 Data Compression Conference (DCC) (2022)

  20. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI), pp. 4278–4284 (2017)

  21. Burt, P.J., Adelson, E.H.: The Laplacian pyramid as a compact image code. IEEE Trans. Commun. (TCOM) 31, 532–540 (1983)

    Article  Google Scholar 

  22. Adelson, E., Anderson, C., Bergen, J., Burt, P., Ogden, J.: Pyramid methods in image processing. RCA Eng. 29 (1983)

  23. Ranjan, A., Black, M.J.: Optical flow estimation using a spatial pyramid network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2720–2729 (2017)

  24. Li, Z., Shu, H., Zheng, C.: Multi-scale single image dehazing using Laplacian and gaussian pyramids. IEEE Trans. Image Process. (TIP) 30, 9270–9279 (2021)

    Article  Google Scholar 

  25. Watson, A.B.: Image compression using the discrete cosine transform. Math. J. 4(1), 81 (1994)

    MathSciNet  Google Scholar 

  26. Chen, Y., Fan, H., Xu, B., Yan, Z., Kalantidis, Y., Rohrbach, M., Yan, S., Feng, J.: Drop an octave: reducing spatial redundancy in convolutional neural networks with octave convolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3435–3444 (2019)

  27. Markus, A.F., Kors, J.A., Rijnbeek, P.R.: The role of explainability in creating trustworthy artificial intelligence for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies. J. Biomed. Inf. (JBI), 103655 (2021)

  28. Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., Zhong, C.: Interpretable machine learning: fundamental principles and 10 grand challenges. arXiv:2103.11251 (2021)

  29. Toderici, G., O’Malley, S.M., Hwang, S.J., Vincent, D., Minnen, D., Baluja, S., Covell, M., Sukthankar, R.: Variable rate image compression with recurrent neural networks. In: International Conference on Learning Representations (ICLR) (2016)

  30. Jia, C., Liu, Z., Wang, Y., Ma, S., Gao, W.: Layered image compression using scalable auto-encoder. In: 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 431–436 (2019). IEEE

  31. Choi, Y., El-Khamy, M., Lee, J.: Variable rate deep image compression with a conditional autoencoder. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3146–3154 (2019)

  32. Duda, J.: Asymmetric numeral systems: Entropy coding combining speed of huffman coding with compression rate of arithmetic coding. arXiv: Information Theory (2013)

  33. Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 603–612 (2019)

  34. Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. Int. J. Comput. Vis. (IJCV) 127(8), 1106–1125 (2019)

    Article  Google Scholar 

  35. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: International Conference for Learning Representations (ICLR) (2015)

  36. Clark, A.: Python Imaging Library (PIL). https://pillow.readthedocs.io/en/5.1.x/index.html (2010)

  37. Bellard: BPG Image Format. https://bellard.org/bpg/ (2014)

  38. JVET: VVC Test Model (VTM). https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM (2018)

  39. Lee, J., Cho, S., Beack, S.-K.: Context-adaptive entropy model for end-to-end optimized image compression. In: International Conference on Learning Representations (ICLR) (2019)

  40. B’egaint, J., Racap’e, F., Feltman, S., Pushparaja, A.: Compressai: a pytorch library and evaluation platform for end-to-end compression research. arXiv:2011.03029 (2020)

  41. Chen, T., Liu, H., Ma, Z., Shen, Q., Cao, X., Wang, Y.: End-to-end learnt image compression via non-local attention optimization and improved context modeling. IEEE Trans. Image Process. (TIP) 30, 3179–3191 (2021)

    Article  Google Scholar 

  42. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision (ECCV), pp. 740–755 (2014). Springer

  43. Jocher, G.: yolov5: v5.0 - YOLOv5-P6 1280 models. https://github.com/ultralytics/yolov5 (2021)

  44. Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K.P., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 40, 834–848 (2018)

    Article  Google Scholar 

  45. Caesar, H., Uijlings, J.R.R., Ferrari, V.: Coco-stuff: Thing and stuff classes in context. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1209–1218 (2018)

Download references

Acknowledgements

The authors would like to thank the associate editor and anonymous reviewers for their constructive comments to improve the quality of this paper. The authors thank Prof. Siwei Ma (Peking University) for valuable discussion and support. The authors thank Dr. Chuanmin Jia (Peking University) for assistance in experiment setup and comments on the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuefeng Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Lin, K. End-to-end optimized image compression with the frequency-oriented transform. Machine Vision and Applications 35, 27 (2024). https://doi.org/10.1007/s00138-023-01507-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01507-x

Keywords

Navigation