当前位置: X-MOL 学术Displays › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Lightweight and fast visual detection method for 3C assembly
Displays ( IF 4.3 ) Pub Date : 2024-01-02 , DOI: 10.1016/j.displa.2023.102631
Wenbai Chen , Genjian Yang , Bo Zhang , Jingchen Li , Yiqun Wang , Haobin Shi

In the context of 3C assembly scenarios, characterized by numerous semi-flexible, heterogeneous, and small slender targets, traditional target detection algorithms face significant challenges such as low accuracy, weak generalization, large model sizes, and slow inference speeds. To address these issues, this study introduces an enhanced method based on the YOLOv5 model, named YOLOv5-GTB. This method integrates Bidirectional Feature Pyramid Network (BiFPN), Ghost lightweight convolution, Vision Transformer (ViT), and adaptive activation function technologies, aiming to improve the accuracy and speed of target detection. Our approach utilizes Ghost convolution to construct a Ghost bottleneck layer, optimizing the feature extraction network while significantly reducing computational costs and enhancing the convolutional neural network’s feature extraction capabilities. Additionally, the Cross-Stage Partial Network (CSPNet) architecture is employed to effectively segment the data flow of the input feature map layer, thereby improving the efficiency of gradient processing. Furthermore, we introduce a fusion structure of CNN and Transformer, leveraging the strengths of convolutional neural networks in local feature extraction and the ViT’s capability in long-range information capture, thereby further enhancing the overall network’s feature extraction performance. Regarding feature fusion, considering the limitations of traditional top-down unidirectional information flow in effectively merging features with both location and semantic information, BiFPN is incorporated into the YOLOv5-GTB. This enhances the fusion of feature layers extracted from both the end of the backbone network and the first module’s fused feature layer, thus improving detection accuracy. Ablation and comparative experiments conducted on the 3C assembly scenario target detection dataset demonstrate the significant advantages of the YOLOv5-GTB model in terms of accuracy and speed. Ultimately, the application of this model to the 3C assembly platform successfully achieves rapid and accurate target recognition in this scenario.

中文翻译:

轻量快速的3C组装视觉检测方法

在3C装配场景中,以大量半柔性、异构、细长目标为特征,传统的目标检测算法面临着精度低、泛化性弱、模型尺寸大、推理速度慢等重大挑战。为了解决这些问题,本研究引入了一种基于 YOLOv5 模型的增强方法,称为 YOLOv5-GTB。该方法集成了双向特征金字塔网络(BiFPN)、Ghost轻量级卷积、视觉变换器(ViT)和自适应激活函数技术,旨在提高目标检测的准确性和速度。我们的方法利用Ghost卷积构建Ghost瓶颈层,优化特征提取网络,同时显着降低计算成本并增强卷积神经网络的特征提取能力。此外,采用跨阶段部分网络(CSPNet)架构来有效分割输入特征图层的数据流,从而提高梯度处理的效率。此外,我们引入了CNN和Transformer的融合结构,利用卷积神经网络在局部特征提取方面的优势和ViT在远程信息捕获方面的能力,从而进一步增强了整个网络的特征提取性能。在特征融合方面,考虑到传统自上而下的单向信息流在有效融合特征与位置和语义信息方面的局限性,将BiFPN合并到YOLOv5-GTB中。这增强了从主干网络末端提取的特征层和第一个模块的融合特征层的融合,从而提高了检测精度。在3C装配场景目标检测数据集上进行的消融和对比实验证明了YOLOv5-GTB模型在精度和速度方面的显着优势。最终,将该模型应用到3C组装平台上,成功实现了该场景下的快速、准确的目标识别。
更新日期:2024-01-02
down
wechat
bug