Skip to main content
Log in

Keyframe-based RGB-D dense visual SLAM fused semantic cues in dynamic scenes

  • RESEARCH
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

The robustness of dense visual SLAM is still a challenging problem in dynamic environments. In this paper, we propose a novel keyframe-based dense visual SLAM to handle a highly dynamic environment by using an RGB-D camera. The proposed method uses cluster-based residual models and semantic cues to detect dynamic objects, resulting in motion segmentation that outperforms traditional methods. The method also employs motion-segmentation based keyframe selection strategies and frame-to-keyframe matching scheme that reduce the influence of dynamic objects, thus minimizing trajectory errors. We further filter out dynamic object influence based on motion segmentation and then employ true matches from keyframes, which are near the current keyframe, to facilitate loop closure. Finally, a pose graph is established and optimized using the g2o framework. Our experimental results demonstrate the success of our approach in handling highly dynamic sequences, as evidenced by the more robust motion segmentation results and significantly lower trajectory drift compared to several state-of-the-art dense visual odometry or SLAM methods on challenging public benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Fioraio, N. and Stefano, L.D.: Joint detection, tracking and mapping by semantic bundle adjustment. In: IEEE Conference on Computer Vision and Pattern Recognition. (2013)

  2. Reddy, N.D., et al.: Dynamic body VSLAM with semantic constraints. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. (2015)

  3. Mousavian, A., Kosecka, J. and Lien, J.M.: Semantically guided location recognition for outdoors scenes. In: IEEE International Conference on Robotics and Automation. (2015)

  4. Klein, G. and Murray, D.: Parallel tracking and mapping for small AR workspaces. In: IEEE and ACM International Symposium on Mixed and Augmented Reality. (2007)

  5. Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Rob. 31(5), 1147–1163 (2015)

    Article  Google Scholar 

  6. Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Rob. 33(5), 1255–1262 (2016)

    Article  Google Scholar 

  7. Newcombe, R.A., Lovegrove, S.J. and Davison, A.J.: DTAM: Dense tracking and mapping in real-time. In: International Conference on Computer Vision (ICCV). 2011. IEEE

  8. Richard A. Newcombe, Otmar Hilliges, S.I., David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, Andrew Fitzgibbon. KinectFusion: Real-time dense surface mapping and tracking. In: IEEE International Symposium on Mixed and Augmented Reality 2011 Science and Technolgy Proceedings. 2011. Basel, Switzerland: IEEE

  9. Steinbrücker, F., Sturm, J. and Cremers, D.: Real-time visual odometry from dense RGB-D images. In: International Conference on Computer Vision Workshops. 2011. IEEE

  10. Kerl, C., Sturm, J. and Cremers, D.: Robust odometry estimation for RGB-D cameras. In: International Conference on Robotics and Automation (ICRA). 2013. IEEE

  11. Kerl, C., Sturm, J., Cremers, D.: Dense visual SLAM for RGB-D cameras. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2014. IEEE

  12. Whelan, T., et al.: ElasticFusion: Dense SLAM without a pose graph. 2015. robotics: science and systems Conference

  13. Bescos, B., et al.: DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 3(4), 4076–4083 (2018)

    Article  Google Scholar 

  14. Xiao, L., et al.: Dynamic-SLAM: Semantic monocular visual localization and mapping based on deep learning in dynamic environment. Robot. Auton. Syst. 117, 1–16 (2019)

    Article  Google Scholar 

  15. Yu, C., et al.: DS-SLAM: A semantic visual SLAM towards dynamic environments. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2018. IEEE

  16. Taketomi, T., Uchiyama, H., Ikeda, S.: Visual SLAM algorithms: a survey from 2010 to 2016. IPSJ Trans. Comput. Vis. Appl. 9(1), 16 (2017)

    Article  Google Scholar 

  17. Scona, R., et al.: StaticFusion: Background reconstruction for dense RGB-D SLAM in dynamic environments. In: International Conference on Robotics and Automation (ICRA). 2018. IEEE

  18. Rünz, M., Agapito, L.: Co-fusion: Real-time segmentation, tracking and fusion of multiple objects. In: International Conference on Robotics and Automation (ICRA). 2017. IEEE

  19. Runz, M., Buffier, M., Agapito, L.: Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects. In: International Symposium on Mixed and Augmented Reality (ISMAR). 2018. IEEE

  20. Jaimez, M., et al.: Fast odometry and scene flow from RGB-D cameras based on geometric clustering. In: International Conference on Robotics and Automation (ICRA). 2017. IEEE

  21. Kim, D.H., Kim, J.H.: Effective background model-based RGB-D dense visual odometry in a dynamic environment. IEEE Trans. Rob. 32(6), 1565–1573 (2017)

    Article  Google Scholar 

  22. Zhou, W., et al.: Nonparametric statistical and clustering based RGB-D dense visual odometry in a dynamic environment. 3D Res. 10(2) (2019)

  23. Sturm, J., et al.: A benchmark for the evaluation of RGB-D SLAM systems. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2012. IEEE

  24. Fuentes-Pacheco, J., Ruiz-Ascencio, J., Rendón-Mancha, J.M.: Visual simultaneous localization and mapping: a survey. Artif. Intell. Rev. 43(1), 55–81 (2015)

    Article  Google Scholar 

  25. Torr, P.H.: Geometric motion segmentation and model selection. Philosoph. Trans. Royal Soc. London Math. Phys. Eng. Sci. 356(1740), 1321–1340 (1998)

    Article  MathSciNet  Google Scholar 

  26. Whelan, T., et al.: Robust real-time visual odometry for dense RGB-D mapping. In: International Conference on Robotics and Automation (ICRA). 2013. IEEE

  27. Zou, D., Tan, P.: CoSLAM: collaborative visual SLAM in dynamic environments. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 354–366 (2013)

    Article  Google Scholar 

  28. He, K., et al.: Mask R-CNN. In: International Conference on Computer Vision (ICCV). 2017. IEEE

  29. Li, S., Lee, D.: RGB-D SLAM in dynamic environments using static point weighting. IEEE Robot. Autom. Lett. 2(4), 2263–2270 (2017)

    Article  Google Scholar 

  30. Sun, Y., Liu, M., Meng, Q.H.: Improving RGB-D SLAM in dynamic environments: a motion removal approach. Robot. Auton. Syst. 89, 110–122 (2017)

    Article  Google Scholar 

  31. Zhou, T., et al. Unsupervised learning of depth and ego-motion from video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. IEEE

  32. Geiger, A., et al.: Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)

    Article  Google Scholar 

  33. Yin, Z., Shi, J.: Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018. IEEE

  34. Ranjan, A. et al.: Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2019. IEEE

  35. Bideau, P., et al.: The best of both worlds: combining cnns and geometric constraints for hierarchical motion segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018. IEEE

  36. Bideau, P., Menon, R.R., Learned-Miller, E.: MoA-Net: Self-supervised Motion Segmentation. In: European Conference on Computer Vision. 2018. Springer

  37. Lv, Z., et al.: Learning rigidity in dynamic scenes with a moving camera for 3d motion field estimation. In: Proceedings of the European Conference on Computer Vision (ECCV). (2018)

  38. Bescos, B., et al.: Empty Cities: Image Inpainting for a Dynamic-Object-Invariant Space. In: 2019 International Conference on Robotics and Automation (ICRA). 2019. IEEE

  39. Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2016)

  40. Wang, Q., et al.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2019)

  41. Cummins, M. J., Newman, P. M.: L Fab-map: Appearance-based place recognition and mapping using a learned visual vocabulary model. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10). (2010)

  42. Angeli, A., et al.: Fast and incremental method for loop-closure detection using bags of visual words. IEEE Trans. Rob. 24(5), 1027–1037 (2008)

    Article  Google Scholar 

  43. Stückler, J., Behnke, S.: Integrating depth and color cues for dense multi-resolution scene mapping using rgb-d cameras. In: International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI). 2012. IEEE

  44. Glocker, B., et al.: Real-time RGB-D camera relocalization via randomized ferns for keyframe encoding. IEEE Trans. Visual Comput. Graphics 21(5), 571–583 (2014)

    Article  Google Scholar 

  45. Kümmerle, R., et al.: g 2 o: A general framework for graph optimization. In: Robotics and Automation (ICRA), 2011 IEEE International Conference on. 2011. IEEE

Download references

Funding

This work was supported by the Space Science Advance Research Fund under Grant XDA15014700 funded by the Chinese Academy of Sciences Strategic Leading Science and Technology Project.

Author information

Authors and Affiliations

Authors

Contributions

Wugen Zhou discussed the idea of this work, make most experiments and write the most manuscript. Xiaodong Peng discussed the idea of this work, and give valuable methodological guidance. Yun Li give support to algorithm realized, and Mingrui Fan prepared some figures. Bo Liu reviewed the manuscript.

Corresponding author

Correspondence to Xiaodong Peng.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, W., Peng, X., Li, Y. et al. Keyframe-based RGB-D dense visual SLAM fused semantic cues in dynamic scenes. Machine Vision and Applications 35, 47 (2024). https://doi.org/10.1007/s00138-024-01526-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-024-01526-2

Keywords

Navigation