当前位置: X-MOL 学术Signal Image Video Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A transformer-based UAV instance segmentation model TF-YOLOv7
Signal, Image and Video Processing ( IF 2.3 ) Pub Date : 2024-02-09 , DOI: 10.1007/s11760-023-02992-3
Li Tan , Zikang Liu , Xiaokai Huang , Dongfang Li , Feifei Wang

Abstract

In a dense target scenario in a real city, how to efficiently achieve the labeling of different targets and overcome the problem of mutual occlusion caused by the dense targets in the process becomes the key point of our UAV instance segmentation, therefore, to address the problem of mutual occlusion of targets in UAV instance segmentation, this paper proposes a model for UAV instance segmentation TF-YOLOv7. This model introduced Swin Transformer structure in the backbone network to construct a hierarchical feature map by fusing deep network feature blocks, which is well suited for the dense recognition task of instance segmentation. In addition, the Bottleneck Transformer structure was introduced in the detection stage to recognize the abstract information of the underlying features using convolution, and the higher-level information obtained through the convolution layer is processed using the self-attention mechanism, which could effectively handle large resolution images. Finally, the Focal-EioU loss function was introduced to further optimize the masking performance in mutually occluded small targets for the masking problem in occluded target segmentation and improve the segmentation effect on occluded targets. Through experimental validation on the UAV aerial photography dataset VisDroneDET, our proposed model has a 2.2% performance improvement compared with the benchmark model YOLOv7, proving that the model is suitable for UAV instance segmentation tasks.



中文翻译:

基于Transformer的无人机实例分割模型TF-YOLOv7

摘要

在真实城市的密集目标场景中,如何高效地实现不同目标的标注并克服过程中密集目标造成的相互遮挡问题成为我们无人机实例分割的关键点,因此,要解决这个问题针对无人机实例分割中目标相互遮挡的问题,提出一种无人机实例分割模型TF-YOLOv7。该模型在主干网络中引入了Swin Transformer结构,通过融合深层网络特征块构建分层特征图,非常适合实例分割的密集识别任务。此外,在检测阶段引入了Bottleneck Transformer结构,利用卷积来识别底层特征的抽象信息,并利用自注意力机制对通过卷积层获得的高层信息进行处理,可以有效处理大量特征。分辨率图像。最后,针对遮挡目标分割中的掩蔽问题,引入Focal-EioU损失函数,进一步优化相互遮挡小目标的掩蔽性能,提高对遮挡目标的分割效果。通过在无人机航拍数据集VisDroneDET上的实验验证,我们提出的模型与基准模型YOLOv7相比有2.2%的性能提升,证明该模型适合无人机实例分割任务。

更新日期:2024-02-09
down
wechat
bug