SingleS2R: Single sample driven Sim-to-Real transfer for Multi-Source Visual-Tactile Information Understanding using multi-scale vision transformers,Information Fusion

当前位置： X-MOL 学术 › Inform. Fusion › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SingleS2R: Single sample driven Sim-to-Real transfer for Multi-Source Visual-Tactile Information Understanding using multi-scale vision transformers
Information Fusion ( IF 18.6 ) Pub Date : 2024-03-28 , DOI: 10.1016/j.inffus.2024.102390
Jing Tang , Zeyu Gong , Bo Tao , Zhouping Yin

Due to variations in light transmission and wear on the contact head, existing visual-tactile dataset building methods typically require a large amount of real-world data, making the dataset building process time-consuming and labor-intensive. Sim-to-Real learning has been proposed to realize Multi-Source Visual-Tactile Information Understanding (MSVTIU) in simulate and real environment, which can efficiently promote visual-tactile dataset building using simulation method for emerged robotic applications. However, the existing Sim-to-Real learning also requires more than 10,000 real data, while the corresponding data need to be re-collected when the sensor version changes. To address this challenge, we propose a powerful Sim-to-Real transfer for MSVTIU which requires only one single real-world tactile sample. To effectively extract features from the single real tactile sample, a multi-scale vision transformers-based Generative Adversarial Network (GAN) is proposed to address MSVTIU task under extremely limited data. We introduce a novel scale-dependent self-attention mechanism that allows attention layers to adapt their behavior at different stages of the generating process. In addition, we introduced a residual block for capturing contextual information between adjacent scales, which utilizes shortcut connections to fully preserve texture and structure information. We subsequently enhanced the model understanding of visual-tactile information using elastic transform and adaptive adversarial training strategy, both of which are designed specifically for MSVTIU. Experiments on two public datasets with diverse objects indicate that our Sim-to-Real transfer approach utilizing only one single real-world visual-tactile sample outperforms the state-of-the-art methods that requires tens of thousands of samples.

中文翻译：

SingleS2R：使用多尺度视觉变换器进行单样本驱动的模拟到真实传输，用于多源视觉触觉信息理解

由于光传输的变化和接触头上的磨损，现有的视觉触觉数据集构建方法通常需要大量的真实世界数据，使得数据集构建过程耗时且费力。 Sim-to-Real学习被提出来在模拟和真实环境中实现多源视觉触觉信息理解（MSVTIU），可以有效地促进使用模拟方法为新兴机器人应用构建视觉触觉数据集。然而，现有的Sim-to-Real学习也需要10000多个真实数据，而当传感器版本发生变化时，需要重新收集相应的数据。为了应对这一挑战，我们为 MSVTIU 提出了一种强大的模拟到真实传输，它只需要一个真实世界的触觉样本。为了有效地从单个真实触觉样本中提取特征，提出了一种基于多尺度视觉变压器的生成对抗网络（GAN）来解决数据极其有限的 MSVTIU 任务。我们引入了一种新颖的依赖于尺度的自注意力机制，该机制允许注意力层在生成过程的不同阶段调整其行为。此外，我们引入了一个残差块来捕获相邻尺度之间的上下文信息，它利用快捷连接来完全保留纹理和结构信息。随后，我们使用弹性变换和自适应对抗训练策略增强了模型对视觉触觉信息的理解，这两者都是专门为 MSVTIU 设计的。对两个具有不同对象的公共数据集进行的实验表明，我们的模拟到真实传输方法仅利用一个真实世界的视觉触觉样本，其性能优于需要数万个样本的最先进方法。

更新日期：2024-03-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>