当前位置: X-MOL 学术Front. Inform. Technol. Electron. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Robust cross-modal retrieval with alignment refurbishment
Frontiers of Information Technology & Electronic Engineering ( IF 3 ) Pub Date : 2023-11-07 , DOI: 10.1631/fitee.2200514
Jinyi Guo , Jieyu Ding

Cross-modal retrieval tries to achieve mutual retrieval between modalities by establishing consistent alignment for different modal data. Currently, many cross-modal retrieval methods have been proposed and have achieved excellent results; however, these are trained with clean cross-modal pairs, which are semantically matched but costly, compared with easily available data with noise alignment (i.e., paired but mismatched in semantics). When training these methods with noise-aligned data, the performance degrades dramatically. Therefore, we propose a robust cross-modal retrieval with alignment refurbishment (RCAR), which significantly reduces the impact of noise on the model. Specifically, RCAR first conducts multi-task learning to slow down the overfitting to the noise to make data separable. Then, RCAR uses a two-component beta-mixture model to divide them into clean and noise alignments and refurbishes the label according to the posterior probability of the noise-alignment component. In addition, we define partial and complete noises in the noise-alignment paradigm. Experimental results show that, compared with the popular cross-modal retrieval methods, RCAR achieves more robust performance with both types of noise.



中文翻译:

具有对齐翻新功能的鲁棒跨模态检索

跨模态检索试图通过为不同模态数据建立一致的对齐来实现模态之间的相互检索。目前,许多跨模态检索方法已经被提出并取得了良好的效果;然而,这些是用干净的跨模态对进行训练的,与易于获得的噪声对齐数据(即配对但语义不匹配)相比,它们在语义上匹配但成本高昂。当使用噪声对齐数据训练这些方法时,性能会急剧下降。因此,我们提出了一种具有对齐翻新功能的鲁棒跨模态检索(RCAR),可以显着减少噪声对模型的影响。具体来说,RCAR首先进行多任务学习,减缓对噪声的过度拟合,使数据可分离。然后,RCAR 使用二分量 beta 混合模型将它们分为干净对齐和噪声对齐,并根据噪声对齐分量的后验概率刷新标签。此外,我们在噪声对齐范例中定义了部分和完整噪声。实验结果表明,与流行的跨模态检索方法相比,RCAR 对两种类型的噪声都实现了更鲁棒的性能。

更新日期:2023-11-07
down
wechat
bug