Keyframe-based RGB-D dense visual SLAM fused semantic cues in dynamic scenes

Zhou, Wugen; Peng, Xiaodong; Li, Yun; Fan, Mingrui; Liu, Bo

doi:10.1007/s00138-024-01526-2

Keyframe-based RGB-D dense visual SLAM fused semantic cues in dynamic scenes

RESEARCH
Published: 07 April 2024

Volume 35, article number 47, (2024)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Wugen Zhou^1,2,
Xiaodong Peng¹,
Yun Li¹,
Mingrui Fan^1,2 &
…
Bo Liu¹

100 Accesses
Explore all metrics

Abstract

The robustness of dense visual SLAM is still a challenging problem in dynamic environments. In this paper, we propose a novel keyframe-based dense visual SLAM to handle a highly dynamic environment by using an RGB-D camera. The proposed method uses cluster-based residual models and semantic cues to detect dynamic objects, resulting in motion segmentation that outperforms traditional methods. The method also employs motion-segmentation based keyframe selection strategies and frame-to-keyframe matching scheme that reduce the influence of dynamic objects, thus minimizing trajectory errors. We further filter out dynamic object influence based on motion segmentation and then employ true matches from keyframes, which are near the current keyframe, to facilitate loop closure. Finally, a pose graph is established and optimized using the g2o framework. Our experimental results demonstrate the success of our approach in handling highly dynamic sequences, as evidenced by the more robust motion segmentation results and significantly lower trajectory drift compared to several state-of-the-art dense visual odometry or SLAM methods on challenging public benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fiducial Markers for Pose Estimation

Article 26 March 2021

An Overview to Visual Odometry and Visual SLAM: Applications to Mobile Robotics

Article 13 November 2015

YOLO-SLAM: A semantic SLAM system towards dynamic environment with geometric constraint

Article 08 January 2022

References

Fioraio, N. and Stefano, L.D.: Joint detection, tracking and mapping by semantic bundle adjustment. In: IEEE Conference on Computer Vision and Pattern Recognition. (2013)
Reddy, N.D., et al.: Dynamic body VSLAM with semantic constraints. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. (2015)
Mousavian, A., Kosecka, J. and Lien, J.M.: Semantically guided location recognition for outdoors scenes. In: IEEE International Conference on Robotics and Automation. (2015)
Klein, G. and Murray, D.: Parallel tracking and mapping for small AR workspaces. In: IEEE and ACM International Symposium on Mixed and Augmented Reality. (2007)
Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Rob. 31(5), 1147–1163 (2015)
Article Google Scholar
Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Rob. 33(5), 1255–1262 (2016)
Article Google Scholar
Newcombe, R.A., Lovegrove, S.J. and Davison, A.J.: DTAM: Dense tracking and mapping in real-time. In: International Conference on Computer Vision (ICCV). 2011. IEEE
Richard A. Newcombe, Otmar Hilliges, S.I., David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, Andrew Fitzgibbon. KinectFusion: Real-time dense surface mapping and tracking. In: IEEE International Symposium on Mixed and Augmented Reality 2011 Science and Technolgy Proceedings. 2011. Basel, Switzerland: IEEE
Steinbrücker, F., Sturm, J. and Cremers, D.: Real-time visual odometry from dense RGB-D images. In: International Conference on Computer Vision Workshops. 2011. IEEE
Kerl, C., Sturm, J. and Cremers, D.: Robust odometry estimation for RGB-D cameras. In: International Conference on Robotics and Automation (ICRA). 2013. IEEE
Kerl, C., Sturm, J., Cremers, D.: Dense visual SLAM for RGB-D cameras. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2014. IEEE
Whelan, T., et al.: ElasticFusion: Dense SLAM without a pose graph. 2015. robotics: science and systems Conference
Bescos, B., et al.: DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 3(4), 4076–4083 (2018)
Article Google Scholar
Xiao, L., et al.: Dynamic-SLAM: Semantic monocular visual localization and mapping based on deep learning in dynamic environment. Robot. Auton. Syst. 117, 1–16 (2019)
Article Google Scholar
Yu, C., et al.: DS-SLAM: A semantic visual SLAM towards dynamic environments. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2018. IEEE
Taketomi, T., Uchiyama, H., Ikeda, S.: Visual SLAM algorithms: a survey from 2010 to 2016. IPSJ Trans. Comput. Vis. Appl. 9(1), 16 (2017)
Article Google Scholar
Scona, R., et al.: StaticFusion: Background reconstruction for dense RGB-D SLAM in dynamic environments. In: International Conference on Robotics and Automation (ICRA). 2018. IEEE
Rünz, M., Agapito, L.: Co-fusion: Real-time segmentation, tracking and fusion of multiple objects. In: International Conference on Robotics and Automation (ICRA). 2017. IEEE
Runz, M., Buffier, M., Agapito, L.: Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects. In: International Symposium on Mixed and Augmented Reality (ISMAR). 2018. IEEE
Jaimez, M., et al.: Fast odometry and scene flow from RGB-D cameras based on geometric clustering. In: International Conference on Robotics and Automation (ICRA). 2017. IEEE
Kim, D.H., Kim, J.H.: Effective background model-based RGB-D dense visual odometry in a dynamic environment. IEEE Trans. Rob. 32(6), 1565–1573 (2017)
Article Google Scholar
Zhou, W., et al.: Nonparametric statistical and clustering based RGB-D dense visual odometry in a dynamic environment. 3D Res. 10(2) (2019)
Sturm, J., et al.: A benchmark for the evaluation of RGB-D SLAM systems. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2012. IEEE
Fuentes-Pacheco, J., Ruiz-Ascencio, J., Rendón-Mancha, J.M.: Visual simultaneous localization and mapping: a survey. Artif. Intell. Rev. 43(1), 55–81 (2015)
Article Google Scholar
Torr, P.H.: Geometric motion segmentation and model selection. Philosoph. Trans. Royal Soc. London Math. Phys. Eng. Sci. 356(1740), 1321–1340 (1998)
Article MathSciNet Google Scholar
Whelan, T., et al.: Robust real-time visual odometry for dense RGB-D mapping. In: International Conference on Robotics and Automation (ICRA). 2013. IEEE
Zou, D., Tan, P.: CoSLAM: collaborative visual SLAM in dynamic environments. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 354–366 (2013)
Article Google Scholar
He, K., et al.: Mask R-CNN. In: International Conference on Computer Vision (ICCV). 2017. IEEE
Li, S., Lee, D.: RGB-D SLAM in dynamic environments using static point weighting. IEEE Robot. Autom. Lett. 2(4), 2263–2270 (2017)
Article Google Scholar
Sun, Y., Liu, M., Meng, Q.H.: Improving RGB-D SLAM in dynamic environments: a motion removal approach. Robot. Auton. Syst. 89, 110–122 (2017)
Article Google Scholar
Zhou, T., et al. Unsupervised learning of depth and ego-motion from video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. IEEE
Geiger, A., et al.: Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Article Google Scholar
Yin, Z., Shi, J.: Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018. IEEE
Ranjan, A. et al.: Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2019. IEEE
Bideau, P., et al.: The best of both worlds: combining cnns and geometric constraints for hierarchical motion segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018. IEEE
Bideau, P., Menon, R.R., Learned-Miller, E.: MoA-Net: Self-supervised Motion Segmentation. In: European Conference on Computer Vision. 2018. Springer
Lv, Z., et al.: Learning rigidity in dynamic scenes with a moving camera for 3d motion field estimation. In: Proceedings of the European Conference on Computer Vision (ECCV). (2018)
Bescos, B., et al.: Empty Cities: Image Inpainting for a Dynamic-Object-Invariant Space. In: 2019 International Conference on Robotics and Automation (ICRA). 2019. IEEE
Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2016)
Wang, Q., et al.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2019)
Cummins, M. J., Newman, P. M.: L Fab-map: Appearance-based place recognition and mapping using a learned visual vocabulary model. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10). (2010)
Angeli, A., et al.: Fast and incremental method for loop-closure detection using bags of visual words. IEEE Trans. Rob. 24(5), 1027–1037 (2008)
Article Google Scholar
Stückler, J., Behnke, S.: Integrating depth and color cues for dense multi-resolution scene mapping using rgb-d cameras. In: International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI). 2012. IEEE
Glocker, B., et al.: Real-time RGB-D camera relocalization via randomized ferns for keyframe encoding. IEEE Trans. Visual Comput. Graphics 21(5), 571–583 (2014)
Article Google Scholar
Kümmerle, R., et al.: g 2 o: A general framework for graph optimization. In: Robotics and Automation (ICRA), 2011 IEEE International Conference on. 2011. IEEE

Download references

Funding

This work was supported by the Space Science Advance Research Fund under Grant XDA15014700 funded by the Chinese Academy of Sciences Strategic Leading Science and Technology Project.

Author information

Authors and Affiliations

National Space Science Center, Chinese Academy of Sciences, Beijing, 100190, China
Wugen Zhou, Xiaodong Peng, Yun Li, Mingrui Fan & Bo Liu
University of Chinese Academy of Sciences, Beijing, 100049, China
Wugen Zhou & Mingrui Fan

Authors

Wugen Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Peng
View author publications
You can also search for this author in PubMed Google Scholar
Yun Li
View author publications
You can also search for this author in PubMed Google Scholar
Mingrui Fan
View author publications
You can also search for this author in PubMed Google Scholar
Bo Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Wugen Zhou discussed the idea of this work, make most experiments and write the most manuscript. Xiaodong Peng discussed the idea of this work, and give valuable methodological guidance. Yun Li give support to algorithm realized, and Mingrui Fan prepared some figures. Bo Liu reviewed the manuscript.

Corresponding author

Correspondence to Xiaodong Peng.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhou, W., Peng, X., Li, Y. et al. Keyframe-based RGB-D dense visual SLAM fused semantic cues in dynamic scenes. Machine Vision and Applications 35, 47 (2024). https://doi.org/10.1007/s00138-024-01526-2

Download citation

Received: 18 May 2023
Revised: 27 September 2023
Accepted: 28 February 2024
Published: 07 April 2024
DOI: https://doi.org/10.1007/s00138-024-01526-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Keyframe-based RGB-D dense visual SLAM fused semantic cues in dynamic scenes

Abstract

Access this article

Similar content being viewed by others

Fiducial Markers for Pose Estimation

An Overview to Visual Odometry and Visual SLAM: Applications to Mobile Robotics

YOLO-SLAM: A semantic SLAM system towards dynamic environment with geometric constraint

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Keyframe-based RGB-D dense visual SLAM fused semantic cues in dynamic scenes

Abstract

Access this article

Similar content being viewed by others

Fiducial Markers for Pose Estimation

An Overview to Visual Odometry and Visual SLAM: Applications to Mobile Robotics

YOLO-SLAM: A semantic SLAM system towards dynamic environment with geometric constraint

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation