360° video quality assessment based on saliency-guided viewport extraction

Yang, Fanxi; Yang, Chao; An, Ping; Huang, Xinpeng

doi:10.1007/s00530-024-01285-0

360° video quality assessment based on saliency-guided viewport extraction

Regular Paper
Published: 21 March 2024

Volume 30, article number 89, (2024)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Fanxi Yang¹,
Chao Yang¹,
Ping An¹ &
…
Xinpeng Huang¹

141 Accesses
Explore all metrics

Abstract

Due to the distortion of projection generated during the production of \(360^{\circ }\) video, most quality assessment algorithms used for 2D video have the problem of performance degradation. In this paper, we propose a full-reference \(360^{\circ }\) video quality assessment method, utilizing saliency to guide viewport extraction to eliminate the projection distortion. To be more specific, we first predict the visual saliency of each frame with a \(360^{\circ }\) saliency prediction network and then select the viewport that optimally represents the video frame through the optimal viewport positioning module (OVPM). Furthermore, we propose the attention-based three-dimensional convolutional neural network (3D CNN) quality assessment network to evaluate the video quality, in which 3D CNN convolution and attention modules can better capture the quality degradation of distorted viewports. Experimental results show that our method achieves superior performance in \(360^{\circ }\) video quality assessment tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Visual attention network

Article Open access 28 July 2023

Pyramid Attention Network for Image Restoration

Article Open access 08 August 2023

Data availability

The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

Available: https://github.com/Samsung/360tools.

References

Xu, M., Li, C., Zhang, S., Le Callet, P.: State-of-the-art in 360 video/image processing: perception, assessment and compression. IEEE J. Sel. Top. Signal Process. 14(1), 5–26 (2020)
Article Google Scholar
Martin, D., Serrano, A., Masia, B.: Panoramic convolutions for 360 single-image saliency prediction. In: CVPR workshop on computer vision for augmented and virtual reality, vol. 2 (2020)
Seshadrinathan, K., Bovik, A.C.: Temporal hysteresis model of time varying subjective video quality. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 1153–1156, IEEE (2011)
Wang, Y., Jiang, T., Ma, S., Gao, W.: Novel spatio-temporal structural information based video quality metric. IEEE Trans. Circuits Syst. Video Technol. 22(7), 989–998 (2012)
Article Google Scholar
Seshadrinathan, K., Bovik, A.C.: Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans. Image Process. 19(2), 335–350 (2009)
Article MathSciNet Google Scholar
Vu, P.V., Vu, C.T., Chandler, D.M.: A spatiotemporal most-apparent-distortion model for video quality assessment. In: 2011 18th IEEE international conference on image processing, pp. 2505–2508 (2011). IEEE
Larson, E.C., Chandler, D.M.: Most apparent distortion: full-reference image quality assessment and the role of strategy. J. Electron. Imaging 19(1), 011006–011006 (2010)
Article Google Scholar
Moorthy, A.K., Bovik, A.C.: Efficient video quality assessment along temporal trajectories. IEEE Trans. Circuits Syst. Video Technol. 20(11), 1653–1658 (2010)
Article Google Scholar
Manasa, K., Channappayya, S.S.: An optical flow-based full reference video quality assessment algorithm. IEEE Trans. Image Process. 25(6), 2480–2492 (2016)
Article MathSciNet Google Scholar
You, J., Ebrahimi, T., Perkis, A.: Attention driven foveated video quality assessment. IEEE Trans. Image Process. 23(1), 200–213 (2013)
MathSciNet Google Scholar
He, L., Lu, W., Jia, C., Hao, L.: Video quality assessment by compact representation of energy in 3d-dct domain. Neurocomputing 269, 108–116 (2017)
Article Google Scholar
Tu, Z., Wang, Y., Birkbeck, N., Adsumilli, B., Bovik, A.C.: Ugc-vqa: Benchmarking blind video quality assessment for user generated content. IEEE Trans. Image Process. 30, 4449–4464 (2021)
Article Google Scholar
Ebenezer, J.P., Shang, Z., Wu, Y., Wei, H., Sethuraman, S., Bovik, A.C.: Chipqa: No-reference video quality prediction via space-time chips. IEEE Trans. Image Process. 30, 8059–8074 (2021)
Article Google Scholar
Wu, J., Liu, Y., Dong, W., Shi, G., Lin, W.: Quality assessment for video with degradation along salient trajectories. IEEE Trans. Multimedia 21(11), 2738–2749 (2019)
Article Google Scholar
Rassool, R.: Vmaf reproducibility: Validating a perceptual practical video quality metric, pp. 1–2 (2017). IEEE
Li, Y., Po, L.-M., Cheung, C.-H., Xu, X., Feng, L., Yuan, F., Cheung, K.-W.: No-reference video quality assessment with 3d shearlet transform and convolutional neural networks. IEEE Trans. Circuits Syst. Video Technol. 26(6), 1044–1057 (2015)
Article Google Scholar
Liu, W., Duanmu, Z., Wang, Z.: End-to-end blind quality assessment of compressed videos using deep neural networks., pp. 546–554 (2018)
Zhang, Y., Gao, X., He, L., Lu, W., He, R.: Blind video quality assessment with weakly supervised learning and resampling strategy. IEEE Trans. Circuits Syst. Video Technol. 29(8), 2244–2255 (2018)
Article Google Scholar
Li, D., Jiang, T., Jiang, M.: Quality assessment of in-the-wild videos. In: Proceedings of the 27th ACM international conference on multimedia, pp. 2351–2359 (2019)
Korhonen, J.: Two-level approach for no-reference consumer video quality assessment. IEEE Trans. Image Process. 28(12), 5923–5938 (2019)
Article MathSciNet Google Scholar
Ying, Z., Mandal, M., Ghadiyaram, D., Bovik, A.: Patch-vq:’patching up’the video quality problem. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14019–14029 (2021)
Chen, P., Li, L., Ma, L., Wu, J., Shi, G.: Rirnet: Recurrent-in-recurrent network for video quality assessment. In: Proceedings of the 28th ACM international conference on multimedia, pp. 834–842 (2020)
Xu, M., Chen, J., Wang, H., Liu, S., Li, G., Bai, Z.: C3dvqa: Full-reference video quality assessment with 3d convolutional neural network. In: ICASSP 2020-2020 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp. 4447–4451. IEEE (2020)
Sun, Y., Lu, A., Yu, L.: Weighted-to-spherically-uniform quality evaluation for omnidirectional video. IEEE Signal Process. Lett. 24(9), 1408–1412 (2017)
Google Scholar
Xiu, X., He, Y., Ye, Y., Vishwanath, B.: An evaluation framework for 360-degree video compression. In: 2017 IEEE visual communications and image processing (VCIP), pp. 1–4 (2017). IEEE
Yu, M., Lakshman, H., Girod, B.: A framework to evaluate omnidirectional video coding schemes. In: 2015 IEEE international symposium on mixed and augmented reality, pp. 31–36 (2015). IEEE
Zakharchenko, V., Choi, K.P., Park, J.H.: Quality metric for spherical panoramic video. Opt. Photon. Inform. Process. X 9970, 57–65 (2016)
Google Scholar
Xu, M., Li, C., Chen, Z., Wang, Z., Guan, Z.: Assessing visual quality of omnidirectional videos. IEEE Trans. Circuits Syst. Video Technol. 29(12), 3516–3530 (2018)
Article Google Scholar
Gao, P., Zhang, P., Smolic, A.: Quality assessment for omnidirectional video: a spatio-temporal distortion modeling approach. IEEE Trans. Multimedia 24, 1–16 (2020)
Article Google Scholar
Yang, S., Zhao, J., Jiang, T., Wang, J., Rahim, T., Zhang, B., Xu, Z., Fei, Z.: An objective assessment method based on multi-level factors for panoramic videos. In: 2017 IEEE visual communications and image processing (VCIP), pp. 1–4 (2017). IEEE
Jiang, Z., Xu, Y., Sun, J., Hwang, J.-N., Zhang, Y., Appleby, S.C.: Tile-based panoramic video quality assessment. IEEE Trans. Broadcast. 68(2), 530–544 (2021)
Article Google Scholar
Li, C., Xu, M., Jiang, L., Zhang, S., Tao, X.: Viewport proposal cnn for \(360^{\circ }\) video quality assessment. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 10169–10178 (2019). IEEE
Xu, M., Jiang, L., Li, C., Wang, Z., Tao, X.: Viewport-based CNN: a multi-task approach for assessing \(360^{\circ }\) video quality. IEEE Trans. Pattern Anal. Mach. Intell. 44(4), 2198–2215 (2020)
Google Scholar
Meng, Y., Ma, Z.: Viewport-based omnidirectional video quality assessment: database, modeling and inference. IEEE Trans. Circuits Syst. Video Technol. 32(1), 120–134 (2021)
Article Google Scholar
Chai, X., Shao, F.: Blind quality assessment of omnidirectional videos using spatio-temporal convolutional neural networks. Optik 226, 165887 (2021)
Article Google Scholar
Kim, H.G., Lim, H.-T., Ro, Y.M.: Deep virtual reality image quality assessment with human perception guider for omnidirectional image. IEEE Trans. Circuits Syst. Video Technol. 30(4), 917–928 (2019)
Article Google Scholar
Zhou, Y., Sun, Y., Li, L., Gu, K., Fang, Y.: Omnidirectional image quality assessment by distortion discrimination assisted multi-stream network. IEEE Trans. Circuits Syst. Video Technol. 32(4), 1767–1777 (2021)
Article Google Scholar
Chai, X., Shao, F., Jiang, Q., Meng, X., Ho, Y.-S.: Monocular and binocular interactions oriented deformable convolutional networks for blind quality assessment of stereoscopic omnidirectional images. IEEE Trans. Circuits Syst. Video Technol. 32(6), 3407–3421 (2021)
Article Google Scholar
Sun, W., Min, X., Zhai, G., Gu, K., Duan, H., Ma, S.: Mc360iqa: a multi-channel CNN for blind 360-degree image quality assessment. IEEE J. Sel. Top. Signal Process. 14(1), 64–77 (2019)
Article Google Scholar
Xu, J., Zhou, W., Chen, Z.: Blind omnidirectional image quality assessment with viewport oriented graph convolutional networks. IEEE Trans. Circuits Syst. Video Technol. 31(5), 1724–1737 (2020)
Article Google Scholar
Rai, Y., Le Callet, P., Guillotel, P.: Which saliency weighting for omni directional image quality assessment? In: 2017 Ninth international conference on quality of multimedia experience (QoMEX), pp. 1–6 (2017). IEEE
Sitzmann, V., Serrano, A., Pavel, A., Agrawala, M., Gutierrez, D., Masia, B., Wetzstein, G.: Saliency in vr: How do people explore virtual environments? IEEE Trans. Visual Comput. Graphics 24(4), 1633–1642 (2018)
Article Google Scholar
Xu, Y., Dong, Y., Wu, J., Sun, Z., Shi, Z., Yu, J., Gao, S.: Gaze prediction in dynamic 360 immersive videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5333–5342 (2018)
Cheng, H.-T., Chao, C.-H., Dong, J.-D., Wen, H.-K., Liu, T.-L., Sun, M.: Cube padding for weakly-supervised saliency prediction in 360 videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1420–1429 (2018)
Xu, M., Song, Y., Wang, J., Qiao, M., Huo, L., Wang, Z.: Predicting head movement in panoramic video: a deep reinforcement learning approach. IEEE Trans. Pattern Anal. Mach. Intell. 41(11), 2693–2708 (2018)
Article Google Scholar
Li, F., Bai, H., Zhao, Y.: Visual attention guided eye movements for 360 degree images. In: 2017 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC), pp. 506–511 (2017). IEEE
Assens Reina, M., Giro-i-Nieto, X., McGuinness, K., O’Connor, N.E.: Saltinet: Scan-path prediction on 360 degree images using saliency volumes. In: Proceedings of the IEEE international conference on computer vision workshops, pp. 2331–2338 (2017)
Zhang, L., Shen, Y., Li, H.: Vsi: A visual saliency-induced index for perceptual image quality assessment. IEEE Trans. Image Process. 23(10), 4270–4281 (2014)
Article MathSciNet Google Scholar
Xu, M., Song, Y., Wang, J., Qiao, M., Huo, L., Wang, Z.: Predicting head movement in panoramic video: a deep reinforcement learning approach. IEEE Trans. Pattern Anal. Mach. Intell. 41(11), 2693–2708 (2018)
Article Google Scholar
Coors, B., Condurache, A.P., Geiger, A.: Spherenet: Learning spherical representations for detection and classification in omnidirectional images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 518–533 (2018)
Monroy, R., Lutz, S., Chalasani, T., Smolic, A.: Salnet360: Saliency maps for omni-directional images with CNN. Sig. Process. Image Commun. 69, 26–34 (2018)
Article Google Scholar
Vu, P.V., Chandler, D.M.: Vis 3: an algorithm for video quality assessment via analysis of spatial and spatiotemporal slices. J. Electron. Imaging 23(1), 013016–013016 (2014)
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the NSFC under Grant 62371279, 62171002, 61901252, 62071287, 62020106011, 62371278, and Science and Technology Commission of Shanghai Municipality under Grant 22ZR1424300.

Author information

Authors and Affiliations

School of Communication and Information Engineering, Shanghai University, Shanghai, 200444, China
Fanxi Yang, Chao Yang, Ping An & Xinpeng Huang

Authors

Fanxi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ping An
View author publications
You can also search for this author in PubMed Google Scholar
Xinpeng Huang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Fanxi Yang: contributed to the conception of the study, performed the experiment, and wrote the manuscript text Chao Yang: contributed significantly to the analysis and wrote the manuscript text Ping An: helped perform the analysis with constructive discussions, and reviewed the manuscript. Xinpeng Huang: helped perform the analysis with constructive discussions, and reviewed the manuscript.

Corresponding author

Correspondence to Chao Yang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Communicated by Q. Shen.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, F., Yang, C., An, P. et al. 360° video quality assessment based on saliency-guided viewport extraction. Multimedia Systems 30, 89 (2024). https://doi.org/10.1007/s00530-024-01285-0

Download citation

Received: 06 October 2023
Accepted: 08 February 2024
Published: 21 March 2024
DOI: https://doi.org/10.1007/s00530-024-01285-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

360° video quality assessment based on saliency-guided viewport extraction

Abstract

Access this article

Similar content being viewed by others

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Visual attention network

Pyramid Attention Network for Image Restoration

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

360° video quality assessment based on saliency-guided viewport extraction

Abstract

Access this article

Similar content being viewed by others

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

Visual attention network

Pyramid Attention Network for Image Restoration

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation