Self-Supervised Monocular Depth Estimation by Digging into Uncertainty Quantification

Li, Yuan-Zhen; Zheng, Sheng-Jie; Tan, Zi-Xin; Cao, Tuo; Luo, Fei; Xiao, Chun-Xia

doi:10.1007/s11390-023-3088-y

Self-Supervised Monocular Depth Estimation by Digging into Uncertainty Quantification

Regular Paper
Published: 30 May 2023

Volume 38, pages 510–525, (2023)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Yuan-Zhen Li¹,
Sheng-Jie Zheng¹,
Zi-Xin Tan¹,
Tuo Cao¹,
Fei Luo¹ &
…
Chun-Xia Xiao¹

226 Accesses
Explore all metrics

Abstract

Based on well-designed network architectures and objective functions, self-supervised monocular depth estimation has made great progress. However, lacking a specific mechanism to make the network learn more about the regions containing moving objects or occlusion scenarios, existing depth estimation methods likely produce poor results for them. Therefore, we propose an uncertainty quantification method to improve the performance of existing depth estimation networks without changing their architectures. Our uncertainty quantification method consists of uncertainty measurement, the learning guidance by uncertainty, and the ultimate adaptive determination. Firstly, with Snapshot and Siam learning strategies, we measure the uncertainty degree by calculating the variance of pre-converged epochs or twins during training. Secondly, we use the uncertainty to guide the network to strengthen learning about those regions with more uncertainty. Finally, we use the uncertainty to adaptively produce the final depth estimation results with a balance of accuracy and robustness. To demonstrate the effectiveness of our uncertainty quantification method, we apply it to two state-of-the-art models, Monodepth2 and Hints. Experimental results show that our method has improved the depth estimation performance in seven evaluation metrics compared with two baseline models and exceeded the existing uncertainty method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Zhao X R, Wang X, Chen Q C. Temporally consistent depth map prediction using deep convolutional neural network and spatial-temporal conditional random field. Journal of Computer Science and Technology, 2017, 32(3): 443–456. https://doi.org/10.1007/s11390-017-1735-x.
Article MathSciNet Google Scholar
Fang F, Luo F, Zhang H P, Zhou H J, Chow A L H, Xiao C X. A comprehensive pipeline for complex text-to-image synthesis. Journal of Computer Science and Technology, 2020, 35(3): 522–537. https://doi.org/10.1007/s11390-020-0305-9.
Article Google Scholar
Cao T, Luo F, Fu Y P, Zhang W X, Zheng S J, Xiao C X. DGECN: A depth-guided edge convolutional network for end-to-end 6D pose estimation. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.3773–3782. https://doi.org/10.1109/CVPR52688.2022.00376.
Fu Y P, Yan Q G, Liao J, Xiao C X. Joint texture and geometry optimization for RGB-D reconstruction. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.5949–5958. https://doi.org/10.1109/CVPR42600.2020.00599.
Fu Y P, Yan Q G, Yang L, Liao J, Xiao C X. Texture mapping for 3D reconstruction with RGB-D sensor. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.4645–4653. https://doi.org/10.1109/CVPR.2018.00488.
Fu Y P, Yan Q G, Liao J, Zhou H J, Tang J, Xiao C X. Seamless texture optimization for RGB-D reconstruction. IEEE Trans. Visualization and Computer Graphics, 2023, 29(3): 1845–1859. https://doi.org/10.1109/TVCG.2021.3134105.
Article Google Scholar
Garg R, Vijay Kumar B G, Carneiro G, Reid I. Unsupervised CNN for single view depth estimation: Geometry to the rescue. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.740–756. https://doi.org/10.1007/978-3-319-46484-8_45.
Godard C, Aodha O M, Firman M, Brostow G. Digging into self-supervised monocular depth estimation. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27–Nov. 2, 2019, pp.3827–3837. https://doi.org/10.1109/ICCV.2019.00393.
Bian J W, Zhan H Y, Wang N Y, Li Z C, Zhang L, Shen C H, Cheng M M, Reid I. Unsupervised scale-consistent depth learning from video. International Journal of Computer Vision, 2021, 129(9): 2548–2564. https://doi.org/10.1007/s11263-021-01484-6.
Article Google Scholar
Klingner M, Termöhlen J A, Mikolajczyk J, Fingscheidt T. Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.582–600. https://doi.org/10.1007/978-3-030-58565-5_35.
Guizilini V, Ambruș R, Pillai S, Raventos A, Gaidon A. 3D packing for self-supervised monocular depth estimation. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.2482– 2491. https://doi.org/10.1109/CVPR42600.2020.00256.
Ramamonjisoa M, Firman M, Watson J, Lepetit V, Turmukhambetov D. Single image depth prediction with wavelet decomposition. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.11084–11093. https://doi.org/10.1109/CVPR46437.2021.01094.
Li Y Z, Luo F, Xiao C X. Self-supervised coarse-to-fine monocular depth estimation using a lightweight attention module. Computational Visual Media, 2022, 8(4): 631–647. https://doi.org/10.1007/s41095-022-0279-3.
Article Google Scholar
Watson J, Firman M, Brostow G, Turmukhambetov D. Self-supervised monocular depth hints. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27–Nov. 2, 2019, pp.2162–2171. https://doi.org/10.1109/ICCV.2019.00225.
Asai A, Ikami D, Aizawa K. Multi-task learning based on separable formulation of depth estimation and its uncertainty. In Proc. the 2019 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2019, pp.21–24.
Mertan A, Sahin Y H, Duff D J, Unal G. A new distributional ranking loss with uncertainty: Illustrated in relative depth estimation. In Proc. the 2020 International Conference on 3D Vision, Nov. 2020, pp.1079–1088. https://doi.org/10.1109/3DV50981.2020.00118.
Teixeira L, Oswald M R, Pollefeys M, Chli M. Aerial single-view depth completion with image-guided uncertainty estimation. IEEE Robotics and Automation Letters, 2020, 5(2): 1055–1062. https://doi.org/10.1109/LRA.2020.2967296.
Article Google Scholar
Choi H, Lee H, Kim S, Kim S, Kim S, Sohn K, Min D B. Adaptive confidence thresholding for monocular depth estimation. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.12788– 12798. https://doi.org/10.1109/ICCV48922.2021.01257.
Poggi M, Aleotti F, Tosi F, Mattoccia S. On the uncertainty of self-supervised monocular depth estimation. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.3224–3234. https://doi.org/10.1109/CVPR42600.2020.00329.
Godard C, Aodha O M, Brostow G J. Unsupervised monocular depth estimation with left-right consistency. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.6602–6611. https://doi.org/10.1109/CVPR.2017.699.
Zhou T H, Brown M, Snavely N, Lowe D G. Unsupervised learning of depth and ego-motion from video. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.6612–6619. https://doi.org/10.1109/CVPR.2017.700.
Yin Z C, Shi J P. GeoNet: Unsupervised learning of dense depth, optical flow and camera pose. In Proc. the 2018 IEEE/CVF conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.1983–1992. https://doi.org/10.1109/CVPR.2018.00212.
Zou Y L, Luo Z L, Huang J B. DF-Net: Unsupervised joint learning of depth and flow using cross-task consistency. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.38–55. https://doi.org/10.1007/978-3-030-01228-1_3.
Abdar M, Pourpanah F, Hussain S, Rezazadegan D, Liu L, Ghavamzadeh M, Fieguth P, Cao X C, Khosravi A, Acharya U R, Makarenkov V, Nahavandi S. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion, 2021, 76: 243–297. https://doi.org/10.1016/j.inffus.2021.05.008.
Liu C, Gu J W, Kim K, Narasimhan S G, Kautz J. Neural RGBffD sensing: Depth and uncertainty from a video camera. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.10978–10987. https://doi.org/10.1109/CVPR.2019.01124.
Song C X, Qi C Y, Song S X, Xiao F. Unsupervised monocular depth estimation method based on uncertainty analysis and retinex algorithm. Sensors, 2020, 20(18): 5389. https://doi.org/10.3390/s20185389.
Article Google Scholar
Shen Y C, Zhang Z L, Sabuncu M R, Sun L. Real-time uncertainty estimation in computer vision via uncertainty-aware distribution distillation. In Proc. the 2021 IEEE Winter Conference on Applications of Computer Vision, Jan. 2021, pp.707–716. https://doi.org/10.1109/WACV48630.2021.00075.
Huang G, Li Y X, Pleiss G, Liu Z, Hopcroft J E, Weinberger K Q. Snapshot ensembles: Train 1, get M for free. arXiv: 1704.00109, 2017. https://doi.org/10.48550/arXiv.1704.00109, May 2023.
Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot image recognition. In Proc. the 32nd International Conference on Machine Learning, Jul. 2015.
Zhao H, Gallo O, Frosio I, Kautz J. Loss functions for image restoration with neural networks. IEEE Trans. Computational Imaging, 2017, 3(1): 47–57. https://doi.org/10.1109/TCI.2016.2644865.
Article Google Scholar
Wang Z, Bovik A C, Sheikh H R, Simoncelli E P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Processing, 2004, 13(4): 600–612. https://doi.org/10.1109/TIP.2003.819861.
Article Google Scholar
Hirschmüller H. Accurate and efficient stereo processing by semi-global matching and mutual information. In Proc. the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2005, pp.807–814. https://doi.org/10.1109/CVPR.2005.56.
Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proc. the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2012, pp.3354–3361. https://doi.org/10.1109/CVPR.2012.6248074.
Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proc. the 2015 IEEE International Conference on Computer Vision, Dec. 2015, pp.2650–2658. https://doi.org/10.1109/ICCV.2015.304.
Ranjan A, Jampani V, Balles L, Kim K, Sun D Q, Wulff J, Black M J. Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.12232–12241. https://doi.org/10.1109/CVPR.2019.01252.
Casser V, Pirk S, Mahjourian R, Angelova A. Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. In Proc. the 33rd AAAI Conference on Artificial Intelligence, Jan. 27–Feb. 1, 2019, pp.8001–8008. https://doi.org/10.1609/aaai.v33i01.33018001.
Johnston A, Carneiro G. Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.4755–4764. https://doi.org/10.1109/CVPR42600.2020.00481.
Petrovai A, Nedevschi S. Exploiting pseudo labels in a self-supervised learning framework for improved monocular depth estimation. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.1568–1578. https://doi.org/10.1109/CVPR52688.2022.00163.
Mehta I, Sakurikar P, Narayanan P J. Structured adversarial training for unsupervised monocular depth estimation. In Proc. the 2018 International Conference on 3D Vision, Sept. 2018, pp.314–323. https://doi.org/10.1109/3DV.2018.00044.
Poggi M, Tosi F, Mattoccia S. Learning monocular depth estimation with unsupervised trinocular assumptions. In Proc. the 2018 International Conference on 3D Vision, Sept. 2018, pp.324–333. https://doi.org/10.1109/3DV.2018.00045.

Download references

Author information

Authors and Affiliations

School of Computer Science, Wuhan University, Wuhan, 430072, China
Yuan-Zhen Li, Sheng-Jie Zheng, Zi-Xin Tan, Tuo Cao, Fei Luo & Chun-Xia Xiao

Authors

Yuan-Zhen Li
View author publications
You can also search for this author in PubMed Google Scholar
Sheng-Jie Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Zi-Xin Tan
View author publications
You can also search for this author in PubMed Google Scholar
Tuo Cao
View author publications
You can also search for this author in PubMed Google Scholar
Fei Luo
View author publications
You can also search for this author in PubMed Google Scholar
Chun-Xia Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Fei Luo or Chun-Xia Xiao.

Additional information

Yuan-Zhen Li and Sheng-Jie Zheng are co-first author (Sheng-Jie Zheng proposed the initial idea, completed some coding work, and finished the initial version of the paper. Yuan-Zhen Li improved the method, completed the remaining coding work, conducted experiments, and produced the final version of the paper. These authors contributed equally to the work.)

This work was co-supervised by Chun-Xia Xiao and Fei Luo.

Supplementary Information

ESM 1

(PDF 2250 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, YZ., Zheng, SJ., Tan, ZX. et al. Self-Supervised Monocular Depth Estimation by Digging into Uncertainty Quantification. J. Comput. Sci. Technol. 38, 510–525 (2023). https://doi.org/10.1007/s11390-023-3088-y

Download citation

Received: 11 January 2023
Accepted: 22 May 2023
Published: 30 May 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11390-023-3088-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-Supervised Monocular Depth Estimation by Digging into Uncertainty Quantification

Abstract

Access this article

References

Author information

Authors and Affiliations

Corresponding authors

Additional information

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation