The Applications of 3D Input Data and Scalability Element by Transformer Based Methods: A Review,Archives of Computational Methods in Engineering

当前位置： X-MOL 学术 › Arch. Computat. Methods Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The Applications of 3D Input Data and Scalability Element by Transformer Based Methods: A Review
Archives of Computational Methods in Engineering ( IF 9.7 ) Pub Date : 2024-04-23 , DOI: 10.1007/s11831-024-10108-4
Abubakar Sulaiman Gezawa , Chibiao Liu , Naveed Ur Rehman Junejo , Haruna Chiroma

Outstanding effectiveness of transformers in visual tasks has resulted in its fast growth and adoption in three dimensions (3D) vision tasks. Vision transformers have shown numerous advantages over earlier convolutional neural network (CNN) architectures including broad modelling abilities, more substantial modelling capabilities, convolution complementarity, scalability to model data size, and better connection for enhancing the performance records of many visual tasks. We present thorough review that classifies and summarizes the popular transformer-based approaches based on key features for transformer integration such as the input data, scalability element that enables transformer processing, architectural design, and context level through which the transformer functions as well as a highlight of the primary contributions of each transformer approach. Furthermore, we compare the results of these techniques with commonly employed non-transformer techniques in 3D object classification, segmentation, and object detection using standard 3D datasets including ModelNet, SUN RGB-D, ScanNet, nuScenes, Waymo, ShapeNet, S3DIS, and KITTI. This study also includes the discussion of numerous potential future options and limitation for 3D vision transformers.

中文翻译：

基于 Transformer 的方法对 3D 输入数据和可扩展性元素的应用：回顾

Transformer 在视觉任务中的出色效率导致其在三维 (3D) 视觉任务中快速增长和采用。与早期的卷积神经网络 (CNN) 架构相比，视觉转换器显示出许多优势，包括广泛的建模能力、更丰富的建模能力、卷积互补性、模型数据大小的可扩展性以及用于增强许多视觉任务性能记录的更好连接。我们根据变压器集成的关键特征（例如输入数据、支持变压器处理的可扩展性元素、架构设计和变压器功能的上下文级别以及亮点）对流行的基于变压器的方法进行了全面的回顾和总结。每种变压器方法的主要贡献。此外，我们使用标准 3D 数据集（包括 ModelNet、SUN RGB-D、ScanNet、nuScenes、Waymo、ShapeNet、S3DIS 和 KITTI）将这些技术与 3D 对象分类、分割和对象检测中常用的非 Transformer 技术的结果进行比较。这项研究还包括对 3D 视觉变压器的众多潜在未来选择和限制的讨论。

更新日期：2024-04-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>