Cross-Modal Retrieval With Noisy Correspondence via Consistency Refining and Mining,IEEE Transactions on Image Processing

当前位置： X-MOL 学术 › IEEE Trans. Image Process. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Cross-Modal Retrieval With Noisy Correspondence via Consistency Refining and Mining
IEEE Transactions on Image Processing ( IF 10.6 ) Pub Date : 2024-03-25 , DOI: 10.1109/tip.2024.3374221
Xinran Ma ₁ , Mouxing Yang ₁ , Yunfan Li ₁ , Peng Hu ₁ , Jiancheng Lv ₁ , Xi Peng ₁

Affiliation

The success of existing cross-modal retrieval (CMR) methods heavily rely on the assumption that the annotated cross-modal correspondence is faultless. In practice, however, the correspondence of some pairs would be inevitably contaminated during data collection or annotation, thus leading to the so-called Noisy Correspondence (NC) problem. To alleviate the influence of NC, we propose a novel method termed Consistency REfining And Mining (CREAM) by revealing and exploiting the difference between correspondence and consistency. Specifically, the correspondence and the consistency only be coincident for true positive and true negative pairs, while being distinct for false positive and false negative pairs. Based on the observation, CREAM employs a collaborative learning paradigm to detect and rectify the correspondence of positives, and a negative mining approach to explore and utilize the consistency. Thanks to the consistency refining and mining strategy of CREAM, the overfitting on the false positives could be prevented and the consistency rooted in the false negatives could be exploited, thus leading to a robust CMR method. Extensive experiments verify the effectiveness of our method on three image-text benchmarks including Flickr30K, MS-COCO, and Conceptual Captions. Furthermore, we adopt our method into the graph matching task and the results demonstrate the robustness of our method against fine-grained NC problem. The code is available on https://github.com/XLearning-SCU/2024-TIP-CREAM .

中文翻译：

通过一致性精炼和挖掘进行噪声对应的跨模态检索

现有跨模态检索（CMR）方法的成功在很大程度上依赖于带注释的跨模态对应完美无缺的假设。然而，在实践中，某些对的对应关系在数据收集或注释过程中不可避免地会受到污染，从而导致所谓的噪声对应（NC）问题。为了减轻 NC 的影响，我们通过揭示和利用对应性和一致性之间的差异，提出了一种称为一致性精炼和挖掘（CREAM）的新方法。具体来说，对应性和一致性仅对于真阳性和真阴性对是一致的，而对于假阳性和假阴性对来说是不同的。基于观察，CREAM 采用协作学习范式来检测和纠正正值的对应关系，并采用负值挖掘方法来探索和利用一致性。得益于 CREAM 的一致性精炼和挖掘策略，可以防止误报的过度拟合，并可以利用误报中根源的一致性，从而形成稳健的 CMR 方法。大量实验验证了我们的方法在 Flickr30K、MS-COCO 和 Conceptual Captions 等三个图像文本基准上的有效性。此外，我们将我们的方法应用于图匹配任务，结果证明了我们的方法针对细粒度数控问题的鲁棒性。该代码可在https://github.com/XLearning-SCU/2024-TIP-CREAM 。

更新日期：2024-03-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>