Skip to main content
Log in

Self-Supervised Monocular Depth Estimation by Digging into Uncertainty Quantification

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Based on well-designed network architectures and objective functions, self-supervised monocular depth estimation has made great progress. However, lacking a specific mechanism to make the network learn more about the regions containing moving objects or occlusion scenarios, existing depth estimation methods likely produce poor results for them. Therefore, we propose an uncertainty quantification method to improve the performance of existing depth estimation networks without changing their architectures. Our uncertainty quantification method consists of uncertainty measurement, the learning guidance by uncertainty, and the ultimate adaptive determination. Firstly, with Snapshot and Siam learning strategies, we measure the uncertainty degree by calculating the variance of pre-converged epochs or twins during training. Secondly, we use the uncertainty to guide the network to strengthen learning about those regions with more uncertainty. Finally, we use the uncertainty to adaptively produce the final depth estimation results with a balance of accuracy and robustness. To demonstrate the effectiveness of our uncertainty quantification method, we apply it to two state-of-the-art models, Monodepth2 and Hints. Experimental results show that our method has improved the depth estimation performance in seven evaluation metrics compared with two baseline models and exceeded the existing uncertainty method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Zhao X R, Wang X, Chen Q C. Temporally consistent depth map prediction using deep convolutional neural network and spatial-temporal conditional random field. Journal of Computer Science and Technology, 2017, 32(3): 443–456. https://doi.org/10.1007/s11390-017-1735-x.

    Article  MathSciNet  Google Scholar 

  2. Fang F, Luo F, Zhang H P, Zhou H J, Chow A L H, Xiao C X. A comprehensive pipeline for complex text-to-image synthesis. Journal of Computer Science and Technology, 2020, 35(3): 522–537. https://doi.org/10.1007/s11390-020-0305-9.

    Article  Google Scholar 

  3. Cao T, Luo F, Fu Y P, Zhang W X, Zheng S J, Xiao C X. DGECN: A depth-guided edge convolutional network for end-to-end 6D pose estimation. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.3773–3782. https://doi.org/10.1109/CVPR52688.2022.00376.

  4. Fu Y P, Yan Q G, Liao J, Xiao C X. Joint texture and geometry optimization for RGB-D reconstruction. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.5949–5958. https://doi.org/10.1109/CVPR42600.2020.00599.

  5. Fu Y P, Yan Q G, Yang L, Liao J, Xiao C X. Texture mapping for 3D reconstruction with RGB-D sensor. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.4645–4653. https://doi.org/10.1109/CVPR.2018.00488.

  6. Fu Y P, Yan Q G, Liao J, Zhou H J, Tang J, Xiao C X. Seamless texture optimization for RGB-D reconstruction. IEEE Trans. Visualization and Computer Graphics, 2023, 29(3): 1845–1859. https://doi.org/10.1109/TVCG.2021.3134105.

    Article  Google Scholar 

  7. Garg R, Vijay Kumar B G, Carneiro G, Reid I. Unsupervised CNN for single view depth estimation: Geometry to the rescue. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.740–756. https://doi.org/10.1007/978-3-319-46484-8_45.

  8. Godard C, Aodha O M, Firman M, Brostow G. Digging into self-supervised monocular depth estimation. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27–Nov. 2, 2019, pp.3827–3837. https://doi.org/10.1109/ICCV.2019.00393.

  9. Bian J W, Zhan H Y, Wang N Y, Li Z C, Zhang L, Shen C H, Cheng M M, Reid I. Unsupervised scale-consistent depth learning from video. International Journal of Computer Vision, 2021, 129(9): 2548–2564. https://doi.org/10.1007/s11263-021-01484-6.

    Article  Google Scholar 

  10. Klingner M, Termöhlen J A, Mikolajczyk J, Fingscheidt T. Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.582–600. https://doi.org/10.1007/978-3-030-58565-5_35.

  11. Guizilini V, Ambruș R, Pillai S, Raventos A, Gaidon A. 3D packing for self-supervised monocular depth estimation. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.2482– 2491. https://doi.org/10.1109/CVPR42600.2020.00256.

  12. Ramamonjisoa M, Firman M, Watson J, Lepetit V, Turmukhambetov D. Single image depth prediction with wavelet decomposition. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.11084–11093. https://doi.org/10.1109/CVPR46437.2021.01094.

  13. Li Y Z, Luo F, Xiao C X. Self-supervised coarse-to-fine monocular depth estimation using a lightweight attention module. Computational Visual Media, 2022, 8(4): 631–647. https://doi.org/10.1007/s41095-022-0279-3.

    Article  Google Scholar 

  14. Watson J, Firman M, Brostow G, Turmukhambetov D. Self-supervised monocular depth hints. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27–Nov. 2, 2019, pp.2162–2171. https://doi.org/10.1109/ICCV.2019.00225.

  15. Asai A, Ikami D, Aizawa K. Multi-task learning based on separable formulation of depth estimation and its uncertainty. In Proc. the 2019 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2019, pp.21–24.

  16. Mertan A, Sahin Y H, Duff D J, Unal G. A new distributional ranking loss with uncertainty: Illustrated in relative depth estimation. In Proc. the 2020 International Conference on 3D Vision, Nov. 2020, pp.1079–1088. https://doi.org/10.1109/3DV50981.2020.00118.

  17. Teixeira L, Oswald M R, Pollefeys M, Chli M. Aerial single-view depth completion with image-guided uncertainty estimation. IEEE Robotics and Automation Letters, 2020, 5(2): 1055–1062. https://doi.org/10.1109/LRA.2020.2967296.

    Article  Google Scholar 

  18. Choi H, Lee H, Kim S, Kim S, Kim S, Sohn K, Min D B. Adaptive confidence thresholding for monocular depth estimation. In Proc. the 2021 IEEE/CVF International Conference on Computer Vision, Oct. 2021, pp.12788– 12798. https://doi.org/10.1109/ICCV48922.2021.01257.

  19. Poggi M, Aleotti F, Tosi F, Mattoccia S. On the uncertainty of self-supervised monocular depth estimation. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.3224–3234. https://doi.org/10.1109/CVPR42600.2020.00329.

  20. Godard C, Aodha O M, Brostow G J. Unsupervised monocular depth estimation with left-right consistency. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.6602–6611. https://doi.org/10.1109/CVPR.2017.699.

  21. Zhou T H, Brown M, Snavely N, Lowe D G. Unsupervised learning of depth and ego-motion from video. In Proc. the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.6612–6619. https://doi.org/10.1109/CVPR.2017.700.

  22. Yin Z C, Shi J P. GeoNet: Unsupervised learning of dense depth, optical flow and camera pose. In Proc. the 2018 IEEE/CVF conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.1983–1992. https://doi.org/10.1109/CVPR.2018.00212.

  23. Zou Y L, Luo Z L, Huang J B. DF-Net: Unsupervised joint learning of depth and flow using cross-task consistency. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.38–55. https://doi.org/10.1007/978-3-030-01228-1_3.

  24. Abdar M, Pourpanah F, Hussain S, Rezazadegan D, Liu L, Ghavamzadeh M, Fieguth P, Cao X C, Khosravi A, Acharya U R, Makarenkov V, Nahavandi S. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion, 2021, 76: 243–297. https://doi.org/10.1016/j.inffus.2021.05.008.

  25. Liu C, Gu J W, Kim K, Narasimhan S G, Kautz J. Neural RGBffD sensing: Depth and uncertainty from a video camera. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.10978–10987. https://doi.org/10.1109/CVPR.2019.01124.

  26. Song C X, Qi C Y, Song S X, Xiao F. Unsupervised monocular depth estimation method based on uncertainty analysis and retinex algorithm. Sensors, 2020, 20(18): 5389. https://doi.org/10.3390/s20185389.

    Article  Google Scholar 

  27. Shen Y C, Zhang Z L, Sabuncu M R, Sun L. Real-time uncertainty estimation in computer vision via uncertainty-aware distribution distillation. In Proc. the 2021 IEEE Winter Conference on Applications of Computer Vision, Jan. 2021, pp.707–716. https://doi.org/10.1109/WACV48630.2021.00075.

  28. Huang G, Li Y X, Pleiss G, Liu Z, Hopcroft J E, Weinberger K Q. Snapshot ensembles: Train 1, get M for free. arXiv: 1704.00109, 2017. https://doi.org/10.48550/arXiv.1704.00109, May 2023.

  29. Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot image recognition. In Proc. the 32nd International Conference on Machine Learning, Jul. 2015.

  30. Zhao H, Gallo O, Frosio I, Kautz J. Loss functions for image restoration with neural networks. IEEE Trans. Computational Imaging, 2017, 3(1): 47–57. https://doi.org/10.1109/TCI.2016.2644865.

    Article  Google Scholar 

  31. Wang Z, Bovik A C, Sheikh H R, Simoncelli E P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Processing, 2004, 13(4): 600–612. https://doi.org/10.1109/TIP.2003.819861.

    Article  Google Scholar 

  32. Hirschmüller H. Accurate and efficient stereo processing by semi-global matching and mutual information. In Proc. the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 2005, pp.807–814. https://doi.org/10.1109/CVPR.2005.56.

  33. Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proc. the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2012, pp.3354–3361. https://doi.org/10.1109/CVPR.2012.6248074.

  34. Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proc. the 2015 IEEE International Conference on Computer Vision, Dec. 2015, pp.2650–2658. https://doi.org/10.1109/ICCV.2015.304.

  35. Ranjan A, Jampani V, Balles L, Kim K, Sun D Q, Wulff J, Black M J. Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.12232–12241. https://doi.org/10.1109/CVPR.2019.01252.

  36. Casser V, Pirk S, Mahjourian R, Angelova A. Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. In Proc. the 33rd AAAI Conference on Artificial Intelligence, Jan. 27–Feb. 1, 2019, pp.8001–8008. https://doi.org/10.1609/aaai.v33i01.33018001.

  37. Johnston A, Carneiro G. Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.4755–4764. https://doi.org/10.1109/CVPR42600.2020.00481.

  38. Petrovai A, Nedevschi S. Exploiting pseudo labels in a self-supervised learning framework for improved monocular depth estimation. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.1568–1578. https://doi.org/10.1109/CVPR52688.2022.00163.

  39. Mehta I, Sakurikar P, Narayanan P J. Structured adversarial training for unsupervised monocular depth estimation. In Proc. the 2018 International Conference on 3D Vision, Sept. 2018, pp.314–323. https://doi.org/10.1109/3DV.2018.00044.

  40. Poggi M, Tosi F, Mattoccia S. Learning monocular depth estimation with unsupervised trinocular assumptions. In Proc. the 2018 International Conference on 3D Vision, Sept. 2018, pp.324–333. https://doi.org/10.1109/3DV.2018.00045.

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Fei Luo or Chun-Xia Xiao.

Additional information

Yuan-Zhen Li and Sheng-Jie Zheng are co-first author (Sheng-Jie Zheng proposed the initial idea, completed some coding work, and finished the initial version of the paper. Yuan-Zhen Li improved the method, completed the remaining coding work, conducted experiments, and produced the final version of the paper. These authors contributed equally to the work.)

This work was co-supervised by Chun-Xia Xiao and Fei Luo.

Supplementary Information

ESM 1

(PDF 2250 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, YZ., Zheng, SJ., Tan, ZX. et al. Self-Supervised Monocular Depth Estimation by Digging into Uncertainty Quantification. J. Comput. Sci. Technol. 38, 510–525 (2023). https://doi.org/10.1007/s11390-023-3088-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-023-3088-y

Keywords

Navigation