Abstract
The robustness of dense visual SLAM is still a challenging problem in dynamic environments. In this paper, we propose a novel keyframe-based dense visual SLAM to handle a highly dynamic environment by using an RGB-D camera. The proposed method uses cluster-based residual models and semantic cues to detect dynamic objects, resulting in motion segmentation that outperforms traditional methods. The method also employs motion-segmentation based keyframe selection strategies and frame-to-keyframe matching scheme that reduce the influence of dynamic objects, thus minimizing trajectory errors. We further filter out dynamic object influence based on motion segmentation and then employ true matches from keyframes, which are near the current keyframe, to facilitate loop closure. Finally, a pose graph is established and optimized using the g2o framework. Our experimental results demonstrate the success of our approach in handling highly dynamic sequences, as evidenced by the more robust motion segmentation results and significantly lower trajectory drift compared to several state-of-the-art dense visual odometry or SLAM methods on challenging public benchmark datasets.
Similar content being viewed by others
References
Fioraio, N. and Stefano, L.D.: Joint detection, tracking and mapping by semantic bundle adjustment. In: IEEE Conference on Computer Vision and Pattern Recognition. (2013)
Reddy, N.D., et al.: Dynamic body VSLAM with semantic constraints. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. (2015)
Mousavian, A., Kosecka, J. and Lien, J.M.: Semantically guided location recognition for outdoors scenes. In: IEEE International Conference on Robotics and Automation. (2015)
Klein, G. and Murray, D.: Parallel tracking and mapping for small AR workspaces. In: IEEE and ACM International Symposium on Mixed and Augmented Reality. (2007)
Mur-Artal, R., Montiel, J.M.M., Tardós, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Rob. 31(5), 1147–1163 (2015)
Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Rob. 33(5), 1255–1262 (2016)
Newcombe, R.A., Lovegrove, S.J. and Davison, A.J.: DTAM: Dense tracking and mapping in real-time. In: International Conference on Computer Vision (ICCV). 2011. IEEE
Richard A. Newcombe, Otmar Hilliges, S.I., David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, Andrew Fitzgibbon. KinectFusion: Real-time dense surface mapping and tracking. In: IEEE International Symposium on Mixed and Augmented Reality 2011 Science and Technolgy Proceedings. 2011. Basel, Switzerland: IEEE
Steinbrücker, F., Sturm, J. and Cremers, D.: Real-time visual odometry from dense RGB-D images. In: International Conference on Computer Vision Workshops. 2011. IEEE
Kerl, C., Sturm, J. and Cremers, D.: Robust odometry estimation for RGB-D cameras. In: International Conference on Robotics and Automation (ICRA). 2013. IEEE
Kerl, C., Sturm, J., Cremers, D.: Dense visual SLAM for RGB-D cameras. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2014. IEEE
Whelan, T., et al.: ElasticFusion: Dense SLAM without a pose graph. 2015. robotics: science and systems Conference
Bescos, B., et al.: DynaSLAM: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 3(4), 4076–4083 (2018)
Xiao, L., et al.: Dynamic-SLAM: Semantic monocular visual localization and mapping based on deep learning in dynamic environment. Robot. Auton. Syst. 117, 1–16 (2019)
Yu, C., et al.: DS-SLAM: A semantic visual SLAM towards dynamic environments. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2018. IEEE
Taketomi, T., Uchiyama, H., Ikeda, S.: Visual SLAM algorithms: a survey from 2010 to 2016. IPSJ Trans. Comput. Vis. Appl. 9(1), 16 (2017)
Scona, R., et al.: StaticFusion: Background reconstruction for dense RGB-D SLAM in dynamic environments. In: International Conference on Robotics and Automation (ICRA). 2018. IEEE
Rünz, M., Agapito, L.: Co-fusion: Real-time segmentation, tracking and fusion of multiple objects. In: International Conference on Robotics and Automation (ICRA). 2017. IEEE
Runz, M., Buffier, M., Agapito, L.: Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects. In: International Symposium on Mixed and Augmented Reality (ISMAR). 2018. IEEE
Jaimez, M., et al.: Fast odometry and scene flow from RGB-D cameras based on geometric clustering. In: International Conference on Robotics and Automation (ICRA). 2017. IEEE
Kim, D.H., Kim, J.H.: Effective background model-based RGB-D dense visual odometry in a dynamic environment. IEEE Trans. Rob. 32(6), 1565–1573 (2017)
Zhou, W., et al.: Nonparametric statistical and clustering based RGB-D dense visual odometry in a dynamic environment. 3D Res. 10(2) (2019)
Sturm, J., et al.: A benchmark for the evaluation of RGB-D SLAM systems. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2012. IEEE
Fuentes-Pacheco, J., Ruiz-Ascencio, J., Rendón-Mancha, J.M.: Visual simultaneous localization and mapping: a survey. Artif. Intell. Rev. 43(1), 55–81 (2015)
Torr, P.H.: Geometric motion segmentation and model selection. Philosoph. Trans. Royal Soc. London Math. Phys. Eng. Sci. 356(1740), 1321–1340 (1998)
Whelan, T., et al.: Robust real-time visual odometry for dense RGB-D mapping. In: International Conference on Robotics and Automation (ICRA). 2013. IEEE
Zou, D., Tan, P.: CoSLAM: collaborative visual SLAM in dynamic environments. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 354–366 (2013)
He, K., et al.: Mask R-CNN. In: International Conference on Computer Vision (ICCV). 2017. IEEE
Li, S., Lee, D.: RGB-D SLAM in dynamic environments using static point weighting. IEEE Robot. Autom. Lett. 2(4), 2263–2270 (2017)
Sun, Y., Liu, M., Meng, Q.H.: Improving RGB-D SLAM in dynamic environments: a motion removal approach. Robot. Auton. Syst. 89, 110–122 (2017)
Zhou, T., et al. Unsupervised learning of depth and ego-motion from video. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. IEEE
Geiger, A., et al.: Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)
Yin, Z., Shi, J.: Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018. IEEE
Ranjan, A. et al.: Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2019. IEEE
Bideau, P., et al.: The best of both worlds: combining cnns and geometric constraints for hierarchical motion segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018. IEEE
Bideau, P., Menon, R.R., Learned-Miller, E.: MoA-Net: Self-supervised Motion Segmentation. In: European Conference on Computer Vision. 2018. Springer
Lv, Z., et al.: Learning rigidity in dynamic scenes with a moving camera for 3d motion field estimation. In: Proceedings of the European Conference on Computer Vision (ECCV). (2018)
Bescos, B., et al.: Empty Cities: Image Inpainting for a Dynamic-Object-Invariant Space. In: 2019 International Conference on Robotics and Automation (ICRA). 2019. IEEE
Dai, J., He, K., Sun, J.: Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2016)
Wang, Q., et al.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2019)
Cummins, M. J., Newman, P. M.: L Fab-map: Appearance-based place recognition and mapping using a learned visual vocabulary model. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10). (2010)
Angeli, A., et al.: Fast and incremental method for loop-closure detection using bags of visual words. IEEE Trans. Rob. 24(5), 1027–1037 (2008)
Stückler, J., Behnke, S.: Integrating depth and color cues for dense multi-resolution scene mapping using rgb-d cameras. In: International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI). 2012. IEEE
Glocker, B., et al.: Real-time RGB-D camera relocalization via randomized ferns for keyframe encoding. IEEE Trans. Visual Comput. Graphics 21(5), 571–583 (2014)
Kümmerle, R., et al.: g 2 o: A general framework for graph optimization. In: Robotics and Automation (ICRA), 2011 IEEE International Conference on. 2011. IEEE
Funding
This work was supported by the Space Science Advance Research Fund under Grant XDA15014700 funded by the Chinese Academy of Sciences Strategic Leading Science and Technology Project.
Author information
Authors and Affiliations
Contributions
Wugen Zhou discussed the idea of this work, make most experiments and write the most manuscript. Xiaodong Peng discussed the idea of this work, and give valuable methodological guidance. Yun Li give support to algorithm realized, and Mingrui Fan prepared some figures. Bo Liu reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhou, W., Peng, X., Li, Y. et al. Keyframe-based RGB-D dense visual SLAM fused semantic cues in dynamic scenes. Machine Vision and Applications 35, 47 (2024). https://doi.org/10.1007/s00138-024-01526-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-024-01526-2