MLGT: multi-local guided tracker for visual object tracking,Journal of Real-Time Image Processing

当前位置： X-MOL 学术 › J. Real-Time Image Proc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

MLGT: multi-local guided tracker for visual object tracking
Journal of Real-Time Image Processing ( IF 3 ) Pub Date : 2024-03-19 , DOI: 10.1007/s11554-024-01418-8
Xingzhu Liang , Miaomiao Chen , Erhu Liu

Existing single-stream tracking pipelines achieve good performance improvements by joint feature extraction and interaction. These tracking pipelines establish a bidirectional information flow between the template frame and the search frame, using the correlation and dynamic changes between them to improve the modeling and representation capabilities of the object, thereby improving the accuracy and robustness of tracking. However, these tracking pipelines just use the highest level semantic information of the encoder, and the low-level features are only used to compute new activations, which cannot meet the fine-grained requirements of the tracking task. To solve this issue, we propose a new approach named multi-local guided tracker (MLGT), which merges features obtained at various depths to strengthen the interaction between different semantic information. Specifically, we divide the single-stream pipeline into fixed output stages, and each stage is responsible for extracting and processing different levels of features. Then, we pass the output features into an enhanced fusion module (EFM), which incorporates a shared encoder and concatenation operation. The encoder is used to further extract the information in the joint features, and the catenation operation used to fuse features from different output stages. We conduct extensive evaluations on five datasets, among which we achieve 70.5% SUC on the LaSOT dataset, which is 1.4% higher than the existing single-stream tracker OSTrack.

中文翻译：

MLGT：用于视觉对象跟踪的多本地引导跟踪器

现有的单流跟踪管道通过联合特征提取和交互实现了良好的性能改进。这些跟踪管道在模板帧和搜索帧之间建立双向信息流，利用它们之间的相关性和动态变化来提高对象的建模和表示能力，从而提高跟踪的准确性和鲁棒性。然而，这些跟踪管道仅使用编码器的最高层语义信息，低层特征仅用于计算新的激活，这不能满足跟踪任务的细粒度要求。为了解决这个问题，我们提出了一种名为多局部引导跟踪器（MLGT）的新方法，它合并在不同深度获得的特征以加强不同语义信息之间的交互。具体来说，我们将单流管道划分为固定的输出阶段，每个阶段负责提取和处理不同级别的特征。然后，我们将输出特征传递到增强融合模块（EFM）中，该模块包含共享编码器和串联操作。编码器用于进一步提取联合特征中的信息，连接操作用于融合来自不同输出级的特征。我们对五个数据集进行了广泛的评估，其中我们在 LaSOT 数据集上实现了 70.5% 的 SUC，比现有的单流跟踪器 OSTrack 高出 1.4%。

更新日期：2024-03-21

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>