当前位置: X-MOL 学术Appl. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Resolution-sensitive self-supervised monocular absolute depth estimation
Applied Intelligence ( IF 5.3 ) Pub Date : 2024-04-05 , DOI: 10.1007/s10489-024-05414-0
Yuquan Zhou , Chentao Zhang , Lianjun Deng , Jianji Fu , Hongyi Li , Zhouyi Xu , Jianhuan Zhang

Depth estimation is an essential component of computer vision applications for environment perception, 3D reconstruction and scene understanding. Among the available methods, self-supervised monocular depth estimation is noteworthy for its cost-effectiveness, ease of installation and data accessibility. However, there are two challenges with current methods. Firstly, the scale factor of self-supervised monocular depth estimation is uncertain, which poses significant difficulties for practical applications. Secondly, the depth prediction accuracy for high-resolution images is still unsatisfactory, resulting in low utilization of computational resources. We propose a novel solution to address these challenges with three specific contributions. Firstly, an interleaved depth network skip-connection structure and a new depth network decoder are proposed to improve the depth prediction accuracy for high-resolution images. Secondly, a data vertical splicing module is suggested as a data enhancement method to obtain more non-vertical features and improve model generalization. Lastly, a scale recovery module is proposed to recover the accurate absolute depth without additional sensors, which solves the issue of uncertainty in the scale factor. The experimental results demonstrate that the proposed framework significantly improves the prediction accuracy of high-resolution images. In particular, the novel network structure and data vertical splicing module contribute significantly to this improvement. Moreover, in a scenario where the camera height is fixed and the ground is flat, the effect of scale recovery module is comparable to that achieved by using ground truth. Overall, the RSANet framework offers a promising solution to solve the existing challenges in self-supervised monocular depth estimation.



中文翻译:

分辨率敏感的自监督单目绝对深度估计

深度估计是环境感知、3D 重建和场景理解的计算机视觉应用的重要组成部分。在可用的方法中,自监督单目深度估计因其成本效益、易于安装和数据可访问性而引人注目。然而,当前的方法存在两个挑战。首先,自监督单目深度估计的比例因子是不确定的,这给实际应用带来了很大的困难。其次,高分辨率图像的深度预测精度仍然不理想,导致计算资源利用率低。我们提出了一种新颖的解决方案来应对这些挑战,并做出了三个具体贡献。首先,提出了交错深度网络跳跃连接结构和新的深度网络解码器,以提高高分辨率图像的深度预测精度。其次,提出了数据垂直拼接模块作为数据增强方法,以获得更多非垂直特征并提高模型泛化能力。最后,提出了尺度恢复模块,无需额外的传感器即可恢复准确的绝对深度,解决了尺度因子的不确定性问题。实验结果表明,所提出的框架显着提高了高分辨率图像的预测精度。特别是新颖的网络结构和数据垂直拼接模块对这一改进做出了显着贡献。而且,在相机高度固定且地面平坦的场景下,尺度恢复模块的效果与使用ground Truth实现的效果相当。总体而言,RSANet 框架提供了一个有前途的解决方案,可以解决自监督单目深度估计中现有的挑战。

更新日期:2024-04-06
down
wechat
bug