Abstract
Deep cross-modal hashing has promoted the field of multi-modal retrieval due to its excellent efficiency and storage, but its vulnerability to backdoor attacks is rarely studied. Notably, current deep cross-modal hashing methods inevitably require large-scale training data, resulting in poisoned samples with imperceptible triggers that can easily be camouflaged into the training data to bury backdoors in the victim model. Nevertheless, existing backdoor attacks focus on the uni-modal vision domain, while the multi-modal gap and hash quantization weaken their attack performance. In addressing the aforementioned challenges, we undertake an invisible black-box backdoor attack against deep cross-modal hashing retrieval in this article. To the best of our knowledge, this is the first attempt in this research field. Specifically, we develop a flexible trigger generator to generate the attacker’s specified triggers, which learns the sample semantics of the non-poisoned modality to bridge the cross-modal attack gap. Then, we devise an input-aware injection network, which embeds the generated triggers into benign samples in the form of sample-specific stealth and realizes cross-modal semantic interaction between triggers and poisoned samples. Owing to the knowledge-agnostic of victim models, we enable any cross-modal hashing knockoff to facilitate the black-box backdoor attack and alleviate the attack weakening of hash quantization. Moreover, we propose a confusing perturbation and mask strategy to induce the high-performance victim models to focus on imperceptible triggers in poisoned samples. Extensive experiments on benchmark datasets demonstrate that our method has a state-of-the-art attack performance against deep cross-modal hashing retrieval. Besides, we investigate the influences of transferable attacks, few-shot poisoning, multi-modal poisoning, perceptibility, and potential defenses on backdoor attacks. Our codes and datasets are available at https://github.com/tswang0116/IB3A.
- [1] . 2020. Deep adversarial discrete hashing for cross-modal retrieval. In Proceedings of the International Conference on Multimedia Retrieval. 525–531.Google ScholarDigital Library
- [2] . 2020. Targeted attack for deep hashing based retrieval. In Proceedings of the European Conference on Computer Vision. 618–634.Google ScholarDigital Library
- [3] . 2019. A new backdoor attack in CNNS by training set corruption without label poisoning. In Proceedings of the IEEE International Conference on Image Processing. 101–105.Google ScholarCross Ref
- [4] . 2006. Can machine learning be secure? In Proceedings of the ACM Symposium on Information, Computer and Communications Security. 16–25.Google ScholarDigital Library
- [5] . 2019. Analyzing federated learning through an adversarial lens. In Proceedings of the International Conference on Machine Learning. 634–643.Google Scholar
- [6] . 2018. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In Proceedings of the International Conference on Learning Representations. 1–12.Google Scholar
- [7] . 2014. Return of the devil in the details: Delving deep into convolutional nets. Retrieved from https://arXiv:1405.3531Google Scholar
- [8] . 2017. Targeted backdoor attacks on deep learning systems using data poisoning. Retrieved from https://arXiv:1712.05526Google Scholar
- [9] . 2017. Targeted backdoor attacks on deep learning systems using data poisoning. Retrieved from https://arXiv:1712.05526Google Scholar
- [10] . 2020. Robust unsupervised cross-modal hashing for multimedia retrieval. ACM Trans. Info. Syst. 38, 3 (2020), 1–25.Google ScholarDigital Library
- [11] . 2009. NUS-WIDE: A real-world web image database from national university of singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 1–9.Google ScholarDigital Library
- [12] . 2014. Collective matrix factorization hashing for multimodal data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2075–2082.Google ScholarDigital Library
- [13] . 2020. Februus: Input purification defense against trojan attacks on deep neural network systems. In Proceedings of the Annual Computer Security Applications Conference. 897–912.Google ScholarDigital Library
- [14] . 2021. Clean-label backdoor attack against deep hashing based retrieval. Retrieved from https://arXiv:2109.08868Google Scholar
- [15] . 2014. Explaining and harnessing adversarial examples. Retrieved from https://arXiv:1412.6572Google Scholar
- [16] . 2021. Spectre: Defending against backdoor attacks using robust statistics. In Proceedings of the International Conference on Machine Learning. 4129–4139.Google Scholar
- [17] . 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 15979–15988.Google ScholarCross Ref
- [18] . 2023. Towards making a Trojan-Horse attack on text-to-image retrieval. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 1–5.Google ScholarCross Ref
- [19] . 2022. BadHash: Invisible backdoor attacks against deep hashing with clean label. In Proceedings of the ACM International Conference on Multimedia. 678–686.Google ScholarDigital Library
- [20] . 2008. The MIR flickr retrieval evaluation. In Proceedings of the ACM International Conference on Multimedia Information Retrieval. 39–43.Google ScholarDigital Library
- [21] . 2019. Feature space perturbations yield more transferable adversarial examples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7066–7074.Google ScholarCross Ref
- [22] . 2018. Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. In Proceedings of the IEEE Symposium on Security and Privacy. 19–35.Google ScholarCross Ref
- [23] . 2017. Deep cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3232–3240.Google ScholarCross Ref
- [24] . 2023. Can we mitigate backdoor attack using adversarial detection methods? IEEE Trans. Depend. Secure Comput. 20, 4 (2023), 2867–2881.Google ScholarDigital Library
- [25] . 2021. Comparative analysis on cross-modal information retrieval: A review. Comput. Sci. Rev. 39 (2021), 100336.Google ScholarCross Ref
- [26] . 2014. Adam: A method for stochastic optimization. Retrieved from https://arXiv:1412.6980Google Scholar
- [27] . 2017. Adversarial examples in the physical world. In Proceedings of the International Conference on Learning Representations. 1–11.Google Scholar
- [28] . 2021. Adversarial attack on deep cross-modal hamming retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 2218–2227.Google ScholarCross Ref
- [29] . 2019. Cross-modal learning with adversarial samples. Adv. Neural Info. Process. Syst. 32 (2019), 10791–10801.Google Scholar
- [30] . 2020. Vulnerability vs. reliability: Disentangled adversarial examples for cross-modal learning. In Proceedings of the International Conference on Knowledge Discovery and Data Mining. 421–429.Google ScholarDigital Library
- [31] . 2017. Deep binary reconstruction for cross-modal hashing. In Proceedings of the ACM International Conference on Multimedia. 1398–1406.Google ScholarDigital Library
- [32] . 2022. Backdoor learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 35, 1 (2022), 5–22.Google Scholar
- [33] . 2021. Invisible backdoor attack with sample-specific triggers. In Proceedings of the IEEE International Conference on Computer Vision. 16443–16452.Google ScholarCross Ref
- [34] . 2022. Semantic structure enhanced contrastive adversarial hash network for cross-media representation learning. In Proceedings of the ACM International Conference on Multimedia. 277–285.Google ScholarDigital Library
- [35] . 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. 740–755.Google ScholarCross Ref
- [36] . 2018. Fine-pruning: Defending against backdooring attacks on deep neural networks. In Proceedings of the International Symposium on Research in Attacks, Intrusions, and Defenses. 273–294.Google ScholarCross Ref
- [37] . 2020. Reflection backdoor: A natural backdoor attack on deep neural networks. In Proceedings of the European Conference on Computer Vision, Vol. 12355. 182–199.Google ScholarDigital Library
- [38] . 2017. Towards deep learning models resistant to adversarial attacks. Retrieved from https://arXiv:1706.06083Google Scholar
- [39] . 2023. Poisoning GNN-based recommender systems with generative surrogate-based attacks. ACM Trans. Info. Syst. 41, 3 (2023), 1–24.Google ScholarDigital Library
- [40] . 2013. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36, 3 (2013), 521–535.Google ScholarDigital Library
- [41] . 2018. Poison frogs! targeted clean-label poisoning attacks on neural networks. Adv. Neural Info. Process. Syst. 31 (2018).Google Scholar
- [42] . 2019. Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 3027–3035.Google ScholarCross Ref
- [43] . 2021. Model-targeted poisoning attacks with provable convergence. In Proceedings of the International Conference on Machine Learning. 10000–10010.Google Scholar
- [44] . 2013. Intriguing properties of neural networks. Retrieved from https://arXiv:1312.6199Google Scholar
- [45] . 2020. An embarrassingly simple approach for trojan attack in deep neural networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data mining. 218–228.Google ScholarDigital Library
- [46] . 2019. Label-consistent backdoor attacks. Retrieved from https://arXiv:1912.02771Google Scholar
- [47] . 2023. X-TRA: Improving chest X-ray tasks with cross-modal retrieval augmentation. In Proceedings of the International Conference on Information Processing in Medical Imaging. 471–482.Google ScholarDigital Library
- [48] . 2020. On certifying robustness against backdoor attacks via randomized smoothing. Retrieved from https://arXiv:2002.11750Google Scholar
- [49] . 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In Proceedings of the IEEE Symposium on Security and Privacy. 707–723.Google ScholarCross Ref
- [50] . 2016. A comprehensive survey on cross-modal retrieval. Retrieved from https://arXiv:1607.06215Google Scholar
- [51] . 2023. Targeted adversarial attack against deep cross-modal hashing retrieval. IEEE Trans. Circ. Syst. Video Technol. 33, 10 (2023), 6159–6172.Google ScholarDigital Library
- [52] . 2021. Prototype-supervised adversarial network for targeted attack of deep hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 16357–16366.Google ScholarCross Ref
- [53] . 2022. An intelligent advertisement short video production system via multi-modal retrieval. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 3368–3372.Google ScholarDigital Library
- [54] . 2022. Backdoorbench: A comprehensive benchmark of backdoor learning. Adv. Neural Info. Process. Syst. 35 (2022), 10546–10559.Google Scholar
- [55] . 2023. Adversarial machine learning: A systematic survey of backdoor attack, weight attack and adversarial example. Retrieved from https://arXiv:2302.09457Google Scholar
- [56] . 2023. Prada: Practical black-box adversarial attacks against neural ranking models. ACM Trans. Info. Syst. 41, 4 (2023), 1–27.Google ScholarDigital Library
- [57] . 2021. You see what I want you to see: Exploring targeted black-box transferability attack for hash-based image retrieval systems. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1934–1943.Google ScholarCross Ref
- [58] . 2020. DBA: Distributed backdoor attacks against federated learning. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [59] . 2020. Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Trans. Image Process. 29 (2020), 3626–3637.Google ScholarDigital Library
- [60] . 2022. AMCAD: Adaptive mixed-curvature representation based advertisement retrieval system. In Proceedings of the IEEE International Conference on Data Engineering. 3439–3452.Google ScholarCross Ref
- [61] . 2018. Adversarial examples for hamming space search. IEEE Trans. Cybernet. 50, 4 (2018), 1473–1484.Google ScholarCross Ref
- [62] . 2021. Rethinking the backdoor attacks’ triggers: A frequency perspective. In Proceedings of the IEEE International Conference on Computer Vision. 16453–16461.Google Scholar
- [63] . 2022. Aggregation-based graph convolutional hashing for unsupervised cross-modal retrieval. IEEE Trans. Multimedia 24 (2022), 466–479.Google ScholarDigital Library
- [64] . 2023. Proactive privacy-preserving learning for cross-modal retrieval. ACM Trans. Info. Syst. 41, 2 (2023), 1–23.Google ScholarDigital Library
- [65] . 2022. Deep medical cross-modal attention hashing. Proc. World Wide Web 25, 4 (2022), 1519–1536.Google ScholarDigital Library
- [66] . 2017. Efficient label contamination attacks against black-box learning models. In Proceedings of the International Joint Conference on Artificial Intelligence. 3945–3951.Google ScholarCross Ref
- [67] . 2016. Cross-modal retrieval: A systematic review of methods and future directions. Retrieved from https://arXiv:2308.14263Google Scholar
- [68] . 2022. Efficient query-based black-box attack against cross-modal hashing retrieval. ACM Trans. Info. Syst. 41, 3 (2022), 1–25.Google Scholar
- [69] . 2023. Multi-modal hashing for efficient multimedia retrieval: A survey. IEEE Trans. Knowl. Data Eng. 36, 1 (2023), 239–260.Google Scholar
- [70] . 2013. Linear cross-modal hashing for efficient multimedia search. In Proceedings of the ACM International Conference on Multimedia. 143–152.Google ScholarDigital Library
Index Terms
- Invisible Black-Box Backdoor Attack against Deep Cross-Modal Hashing Retrieval
Recommendations
Efficient Query-based Black-box Attack against Cross-modal Hashing Retrieval
Deep cross-modal hashing retrieval models inherit the vulnerability of deep neural networks. They are vulnerable to adversarial attacks, especially for the form of subtle perturbations to the inputs. Although many adversarial attack methods have been ...
BadHash: Invisible Backdoor Attacks against Deep Hashing with Clean Label
MM '22: Proceedings of the 30th ACM International Conference on MultimediaDue to its powerful feature learning capability and high efficiency, deep hashing has achieved great success in large-scale image retrieval. Meanwhile, extensive works have demonstrated that deep neural networks (DNNs) are susceptible to adversarial ...
IMTM: Invisible Multi-trigger Multimodal Backdoor Attack
Natural Language Processing and Chinese ComputingAbstractComputer Vision and Natural Language Processing have made significant advancements, leading to the emergence of multimodal models that seamlessly integrate diverse input modalities. While these models hold great potential for various applications, ...
Comments