13 January 2024 EPAWFusion: multimodal fusion for 3D object detection based on enhanced points and adaptive weights
Xiang Sun, Shaojing Song, Fan Wu, Tingting Lu, Bohao Li, Zhiqing Miao
Author Affiliations +
Abstract

Fusing LiDAR point cloud and camera image for 3D object detection in autonomous driving has emerged as a captivating research avenue. The core challenge of multimodal fusion is how to seamlessly fuse 3D LiDAR point cloud with 2D camera image. Although current approaches exhibit promising results, they often rely solely on fusion at either the data level, feature level, or object level, and there is still a room for improvement in the utilization of multimodal information. We present an advanced and effective multimodal fusion framework called EPAWFusion for fusing 3D point cloud and 2D camera image at both data level and feature level. EPAWFusion model consists of three key modules: a point enhanced module based on semantic segmentation for data-level fusion, an adaptive weight allocation module for feature-level fusion, and a detector based on 3D sparse convolution. The semantic information of the 2D image is extracted using semantic segmentation, and the calibration matrix is used to establish the point-pixel correspondence. The semantic information and distance information are then attached to the point cloud to achieve data-level fusion. The geometry features of enhanced point cloud are extracted by voxel encoding, and the texture features of image are obtained using a pretrained 2D CNN. Feature-level fusion is achieved via the adaptive weight allocation module. The fused features are fed into a 3D sparse convolution-based detector to obtain the accurate 3D objects. Experiment results demonstrate that EPAWFusion outperforms the baseline network MVXNet on the KITTI dataset for 3D detection of cars, pedestrians, and cyclists by 5.81%, 6.97%, and 3.88%. Additionally, EPAWFusion performs well for single-vehicle-side 3D object detection based on the experimental findings on DAIR-V2X dataset and the inference frame rate of our proposed model reaches 11.1 FPS. The two-layer level fusion of EPAWFusion significantly enhances the performance of multimodal 3D object detection.

© 2024 Society of Photo-Optical Instrumentation Engineers (SPIE)
Xiang Sun, Shaojing Song, Fan Wu, Tingting Lu, Bohao Li, and Zhiqing Miao "EPAWFusion: multimodal fusion for 3D object detection based on enhanced points and adaptive weights," Journal of Applied Remote Sensing 18(1), 017501 (13 January 2024). https://doi.org/10.1117/1.JRS.18.017501
Received: 30 June 2023; Accepted: 21 December 2023; Published: 13 January 2024
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Object detection

Point clouds

Image fusion

Feature fusion

LIDAR

Cameras

Image segmentation

RELATED CONTENT

3D object detection model based on VoteNet
Proceedings of SPIE (May 25 2023)
3D object detection based on local feature fusion
Proceedings of SPIE (December 16 2022)
LiDAR-camera fusion for multi-modal 3D object detection
Proceedings of SPIE (October 09 2023)
An efficient and accurate bev based camera lidar 3d object...
Proceedings of SPIE (December 08 2023)

Back to Top