skip to main content
research-article

Invisible Black-Box Backdoor Attack against Deep Cross-Modal Hashing Retrieval

Published:26 April 2024Publication History
Skip Abstract Section

Abstract

Deep cross-modal hashing has promoted the field of multi-modal retrieval due to its excellent efficiency and storage, but its vulnerability to backdoor attacks is rarely studied. Notably, current deep cross-modal hashing methods inevitably require large-scale training data, resulting in poisoned samples with imperceptible triggers that can easily be camouflaged into the training data to bury backdoors in the victim model. Nevertheless, existing backdoor attacks focus on the uni-modal vision domain, while the multi-modal gap and hash quantization weaken their attack performance. In addressing the aforementioned challenges, we undertake an invisible black-box backdoor attack against deep cross-modal hashing retrieval in this article. To the best of our knowledge, this is the first attempt in this research field. Specifically, we develop a flexible trigger generator to generate the attacker’s specified triggers, which learns the sample semantics of the non-poisoned modality to bridge the cross-modal attack gap. Then, we devise an input-aware injection network, which embeds the generated triggers into benign samples in the form of sample-specific stealth and realizes cross-modal semantic interaction between triggers and poisoned samples. Owing to the knowledge-agnostic of victim models, we enable any cross-modal hashing knockoff to facilitate the black-box backdoor attack and alleviate the attack weakening of hash quantization. Moreover, we propose a confusing perturbation and mask strategy to induce the high-performance victim models to focus on imperceptible triggers in poisoned samples. Extensive experiments on benchmark datasets demonstrate that our method has a state-of-the-art attack performance against deep cross-modal hashing retrieval. Besides, we investigate the influences of transferable attacks, few-shot poisoning, multi-modal poisoning, perceptibility, and potential defenses on backdoor attacks. Our codes and datasets are available at https://github.com/tswang0116/IB3A.

REFERENCES

  1. [1] Bai Cong, Zeng Chao, Ma Qing, Zhang Jinglin, and Chen Shengyong. 2020. Deep adversarial discrete hashing for cross-modal retrieval. In Proceedings of the International Conference on Multimedia Retrieval. 525531.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Bai Jiawang, Chen Bin, Li Yiming, Wu Dongxian, Guo Weiwei, Xia Shu-tao, and Yang En-hui. 2020. Targeted attack for deep hashing based retrieval. In Proceedings of the European Conference on Computer Vision. 618634.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Barni Mauro, Kallas Kassem, and Tondi Benedetta. 2019. A new backdoor attack in CNNS by training set corruption without label poisoning. In Proceedings of the IEEE International Conference on Image Processing. 101105.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Barreno Marco, Nelson Blaine, Sears Russell, Joseph Anthony D., and Tygar J. Doug. 2006. Can machine learning be secure? In Proceedings of the ACM Symposium on Information, Computer and Communications Security. 1625.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Bhagoji Arjun Nitin, Chakraborty Supriyo, Mittal Prateek, and Calo Seraphin. 2019. Analyzing federated learning through an adversarial lens. In Proceedings of the International Conference on Machine Learning. 634643.Google ScholarGoogle Scholar
  6. [6] Brendel Wieland, Rauber Jonas, and Bethge Matthias. 2018. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In Proceedings of the International Conference on Learning Representations. 112.Google ScholarGoogle Scholar
  7. [7] Chatfield Ken, Simonyan Karen, Vedaldi Andrea, and Zisserman Andrew. 2014. Return of the devil in the details: Delving deep into convolutional nets. Retrieved from https://arXiv:1405.3531Google ScholarGoogle Scholar
  8. [8] Chen Xinyun, Liu Chang, Li Bo, Lu Kimberly, and Song Dawn. 2017. Targeted backdoor attacks on deep learning systems using data poisoning. Retrieved from https://arXiv:1712.05526Google ScholarGoogle Scholar
  9. [9] Chen Xinyun, Liu Chang, Li Bo, Lu Kimberly, and Song Dawn. 2017. Targeted backdoor attacks on deep learning systems using data poisoning. Retrieved from https://arXiv:1712.05526Google ScholarGoogle Scholar
  10. [10] Cheng Miaomiao, Jing Liping, and Ng Michael K.. 2020. Robust unsupervised cross-modal hashing for multimedia retrieval. ACM Trans. Info. Syst. 38, 3 (2020), 125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Chua TatSeng, Tang Jinhui, Hong Richang, Li Haojie, Luo Zhiping, and Zheng Yantao. 2009. NUS-WIDE: A real-world web image database from national university of singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Ding Guiguang, Guo Yuchen, and Zhou Jile. 2014. Collective matrix factorization hashing for multimodal data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 20752082.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Doan Bao Gia, Abbasnejad Ehsan, and Ranasinghe Damith C.. 2020. Februus: Input purification defense against trojan attacks on deep neural network systems. In Proceedings of the Annual Computer Security Applications Conference. 897912.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Gao Kuofeng, Bai Jiawang, Chen Bin, Wu Dongxian, and Xia Shu-Tao. 2021. Clean-label backdoor attack against deep hashing based retrieval. Retrieved from https://arXiv:2109.08868Google ScholarGoogle Scholar
  15. [15] Goodfellow Ian J., Shlens Jonathon, and Szegedy Christian. 2014. Explaining and harnessing adversarial examples. Retrieved from https://arXiv:1412.6572Google ScholarGoogle Scholar
  16. [16] Hayase Jonathan, Kong Weihao, Somani Raghav, and Oh Sewoong. 2021. Spectre: Defending against backdoor attacks using robust statistics. In Proceedings of the International Conference on Machine Learning. 41294139.Google ScholarGoogle Scholar
  17. [17] He Kaiming, Chen Xinlei, Xie Saining, Li Yanghao, Dollár Piotr, and Girshick Ross B.. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1597915988.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Hu Fan, Chen Aozhu, and Li Xirong. 2023. Towards making a Trojan-Horse attack on text-to-image retrieval. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 15.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Hu Shengshan, Zhou Ziqi, Zhang Yechao, Zhang Leo Yu, Zheng Yifeng, He Yuanyuan, and Jin Hai. 2022. BadHash: Invisible backdoor attacks against deep hashing with clean label. In Proceedings of the ACM International Conference on Multimedia. 678686.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Huiskes Mark J. and Lew Michael S.. 2008. The MIR flickr retrieval evaluation. In Proceedings of the ACM International Conference on Multimedia Information Retrieval. 3943.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Inkawhich Nathan, Wen Wei, Li Hai (Helen), and Chen Yiran. 2019. Feature space perturbations yield more transferable adversarial examples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 70667074.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Jagielski Matthew, Oprea Alina, Biggio Battista, Liu Chang, Nita-Rotaru Cristina, and Li Bo. 2018. Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. In Proceedings of the IEEE Symposium on Security and Privacy. 1935.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Jiang Qing-Yuan and Li Wu-Jun. 2017. Deep cross-modal hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 32323240.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Jin Kaidi, Zhang Tianwei, Shen Chao, Chen Yufei, Fan Ming, Lin Chenhao, and Liu Ting. 2023. Can we mitigate backdoor attack using adversarial detection methods? IEEE Trans. Depend. Secure Comput. 20, 4 (2023), 28672881.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Kaur Parminder, Pannu Husanbir Singh, and Malhi Avleen Kaur. 2021. Comparative analysis on cross-modal information retrieval: A review. Comput. Sci. Rev. 39 (2021), 100336.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. Retrieved from https://arXiv:1412.6980Google ScholarGoogle Scholar
  27. [27] Kurakin Alexey, Goodfellow Ian, and Bengio Samy. 2017. Adversarial examples in the physical world. In Proceedings of the International Conference on Learning Representations. 111.Google ScholarGoogle Scholar
  28. [28] Li Chao, Gao Shangqian, Deng Cheng, Liu Wei, and Huang Heng. 2021. Adversarial attack on deep cross-modal hamming retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 22182227.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Li Chao, Gao Shangqian, Deng Cheng, Xie De, and Liu Wei. 2019. Cross-modal learning with adversarial samples. Adv. Neural Info. Process. Syst. 32 (2019), 1079110801.Google ScholarGoogle Scholar
  30. [30] Li Chao, Tang Haoteng, Deng Cheng, Zhan Liang, and Liu Wei. 2020. Vulnerability vs. reliability: Disentangled adversarial examples for cross-modal learning. In Proceedings of the International Conference on Knowledge Discovery and Data Mining. 421429.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Li Xuelong, Hu Di, and Nie Feiping. 2017. Deep binary reconstruction for cross-modal hashing. In Proceedings of the ACM International Conference on Multimedia. 13981406.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Li Yiming, Jiang Yong, Li Zhifeng, and Xia Shu-Tao. 2022. Backdoor learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 35, 1 (2022), 5–22.Google ScholarGoogle Scholar
  33. [33] Li Yuezun, Li Yiming, Wu Baoyuan, Li Longkang, He Ran, and Lyu Siwei. 2021. Invisible backdoor attack with sample-specific triggers. In Proceedings of the IEEE International Conference on Computer Vision. 1644316452.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Liang MeiYu, Du Junping, Cao Xiaowen, Yu Yang, Lu Kangkang, Xue Zhe, and Zhang Min. 2022. Semantic structure enhanced contrastive adversarial hash network for cross-media representation learning. In Proceedings of the ACM International Conference on Multimedia. 277285.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Lin TsungYi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C. Lawrence. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. 740755.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Liu Kang, Dolan-Gavitt Brendan, and Garg Siddharth. 2018. Fine-pruning: Defending against backdooring attacks on deep neural networks. In Proceedings of the International Symposium on Research in Attacks, Intrusions, and Defenses. 273294.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Liu Yunfei, Ma Xingjun, Bailey James, and Lu Feng. 2020. Reflection backdoor: A natural backdoor attack on deep neural networks. In Proceedings of the European Conference on Computer Vision, Vol. 12355. 182199.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Madry Aleksander, Makelov Aleksandar, Schmidt Ludwig, Tsipras Dimitris, and Vladu Adrian. 2017. Towards deep learning models resistant to adversarial attacks. Retrieved from https://arXiv:1706.06083Google ScholarGoogle Scholar
  39. [39] Thanh Toan Nguyen, Quach Nguyen Duc Khang, Nguyen Thanh Tam, Huynh Thanh Trung, Vu Viet Hung, Nguyen Phi Le, Jo Jun, and Nguyen Quoc Viet Hung. 2023. Poisoning GNN-based recommender systems with generative surrogate-based attacks. ACM Trans. Info. Syst. 41, 3 (2023), 124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Pereira Jose Costa, Coviello Emanuele, Doyle Gabriel, Rasiwasia Nikhil, Lanckriet Gert R. G., Levy Roger, and Vasconcelos Nuno. 2013. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36, 3 (2013), 521535.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Shafahi Ali, Huang W. Ronny, Najibi Mahyar, Suciu Octavian, Studer Christoph, Dumitras Tudor, and Goldstein Tom. 2018. Poison frogs! targeted clean-label poisoning attacks on neural networks. Adv. Neural Info. Process. Syst. 31 (2018).Google ScholarGoogle Scholar
  42. [42] Su Shupeng, Zhong Zhisheng, and Zhang Chao. 2019. Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 30273035.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Suya Fnu, Mahloujifar Saeed, Suri Anshuman, Evans David, and Tian Yuan. 2021. Model-targeted poisoning attacks with provable convergence. In Proceedings of the International Conference on Machine Learning. 1000010010.Google ScholarGoogle Scholar
  44. [44] Szegedy Christian, Zaremba Wojciech, Sutskever Ilya, Bruna Joan, Erhan Dumitru, Goodfellow Ian, and Fergus Rob. 2013. Intriguing properties of neural networks. Retrieved from https://arXiv:1312.6199Google ScholarGoogle Scholar
  45. [45] Tang Ruixiang, Du Mengnan, Liu Ninghao, Yang Fan, and Hu Xia. 2020. An embarrassingly simple approach for trojan attack in deep neural networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data mining. 218228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Turner Alexander, Tsipras Dimitris, and Madry Aleksander. 2019. Label-consistent backdoor attacks. Retrieved from https://arXiv:1912.02771Google ScholarGoogle Scholar
  47. [47] Sonsbeek Tom van and Worring Marcel. 2023. X-TRA: Improving chest X-ray tasks with cross-modal retrieval augmentation. In Proceedings of the International Conference on Information Processing in Medical Imaging. 471482.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Wang Binghui, Cao Xiaoyu, Gong Neil Zhenqiang et al. 2020. On certifying robustness against backdoor attacks via randomized smoothing. Retrieved from https://arXiv:2002.11750Google ScholarGoogle Scholar
  49. [49] Wang Bolun, Yao Yuanshun, Shan Shawn, Li Huiying, Viswanath Bimal, Zheng Haitao, and Zhao Ben Y.. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In Proceedings of the IEEE Symposium on Security and Privacy. 707723.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Wang Kaiye, Yin Qiyue, Wang Wei, Wu Shu, and Wang Liang. 2016. A comprehensive survey on cross-modal retrieval. Retrieved from https://arXiv:1607.06215Google ScholarGoogle Scholar
  51. [51] Wang Tianshi, Zhu Lei, Zhang Zheng, Zhang Huaxiang, and Han Junwei. 2023. Targeted adversarial attack against deep cross-modal hashing retrieval. IEEE Trans. Circ. Syst. Video Technol. 33, 10 (2023), 6159–6172.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Wang Xunguang, Zhang Zheng, Wu Baoyuan, Shen Fumin, and Lu Guangming. 2021. Prototype-supervised adversarial network for targeted attack of deep hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1635716366.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Wei Yanheng, Huang Lianghua, Zhang Yanhao, Zheng Yun, and Pan Pan. 2022. An intelligent advertisement short video production system via multi-modal retrieval. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 33683372.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Wu Baoyuan, Chen Hongrui, Zhang Mingda, Zhu Zihao, Wei Shaokui, Yuan Danni, and Shen Chao. 2022. Backdoorbench: A comprehensive benchmark of backdoor learning. Adv. Neural Info. Process. Syst. 35 (2022), 1054610559.Google ScholarGoogle Scholar
  55. [55] Wu Baoyuan, Liu Li, Zhu Zihao, Liu Qingshan, He Zhaofeng, and Lyu Siwei. 2023. Adversarial machine learning: A systematic survey of backdoor attack, weight attack and adversarial example. Retrieved from https://arXiv:2302.09457Google ScholarGoogle Scholar
  56. [56] Wu Chen, Zhang Ruqing, Guo Jiafeng, Rijke Maarten De, Fan Yixing, and Cheng Xueqi. 2023. Prada: Practical black-box adversarial attacks against neural ranking models. ACM Trans. Info. Syst. 41, 4 (2023), 127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Xiao Yanru and Wang Cong. 2021. You see what I want you to see: Exploring targeted black-box transferability attack for hash-based image retrieval systems. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19341943.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Xie Chulin, Huang Keli, Chen Pin Yu, and Li Bo. 2020. DBA: Distributed backdoor attacks against federated learning. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  59. [59] Xie De, Deng Cheng, Li Chao, Liu Xianglong, and Tao Dacheng. 2020. Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Trans. Image Process. 29 (2020), 36263637.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Xu Zhirong, Wen Shiyang, Wang Junshan, Liu Guojun, Wang Liang, Yang Zhi, Ding Lei, Zhang Yan, Zhang Di, Xu Jian, and Zheng Bo. 2022. AMCAD: Adaptive mixed-curvature representation based advertisement retrieval system. In Proceedings of the IEEE International Conference on Data Engineering. 34393452.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Yang Erkun, Liu Tongliang, Deng Cheng, and Tao Dacheng. 2018. Adversarial examples for hamming space search. IEEE Trans. Cybernet. 50, 4 (2018), 14731484.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Zeng Yi, Park Won, Mao Z. Morley, and Jia Ruoxi. 2021. Rethinking the backdoor attacks’ triggers: A frequency perspective. In Proceedings of the IEEE International Conference on Computer Vision. 1645316461.Google ScholarGoogle Scholar
  63. [63] Zhang PengFei, Li Yang, Huang Zi, and Xu XinShun. 2022. Aggregation-based graph convolutional hashing for unsupervised cross-modal retrieval. IEEE Trans. Multimedia 24 (2022), 466479.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Zhang Peng-Fei, Bai Guangdong, Yin Hongzhi, and Huang Zi. 2023. Proactive privacy-preserving learning for cross-modal retrieval. ACM Trans. Info. Syst. 41, 2 (2023), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] Zhang Yong, Ou Weihua, Shi Yufeng, Deng Jiaxin, You Xinge, and Wang Anzhi. 2022. Deep medical cross-modal attention hashing. Proc. World Wide Web 25, 4 (2022), 15191536.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. [66] Zhao Mengchen, An Bo, Gao Wei, and Zhang Teng. 2017. Efficient label contamination attacks against black-box learning models. In Proceedings of the International Joint Conference on Artificial Intelligence. 39453951.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Zhu Lei, Wang Tianshi, Li Fengling, Li Jingjing, Zhang Zheng, and Shen Heng Tao. 2016. Cross-modal retrieval: A systematic review of methods and future directions. Retrieved from https://arXiv:2308.14263Google ScholarGoogle Scholar
  68. [68] Zhu Lei, Wang Tianshi, Li Jingjing, Zhang Zheng, Shen Jialie, and Wang Xinhua. 2022. Efficient query-based black-box attack against cross-modal hashing retrieval. ACM Trans. Info. Syst. 41, 3 (2022), 1–25.Google ScholarGoogle Scholar
  69. [69] Zhu Lei, Zheng Chaoqun, Guan Weili, Li Jingjing, Yang Yang, and Shen Heng Tao. 2023. Multi-modal hashing for efficient multimedia retrieval: A survey. IEEE Trans. Knowl. Data Eng. 36, 1 (2023), 239–260.Google ScholarGoogle Scholar
  70. [70] Zhu Xiaofeng, Huang Zi, Shen Heng Tao, and Zhao Xin. 2013. Linear cross-modal hashing for efficient multimedia search. In Proceedings of the ACM International Conference on Multimedia. 143152.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Invisible Black-Box Backdoor Attack against Deep Cross-Modal Hashing Retrieval

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 42, Issue 4
      July 2024
      751 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/3613639
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 April 2024
      • Online AM: 2 March 2024
      • Accepted: 21 February 2024
      • Revised: 6 January 2024
      • Received: 3 September 2023
      Published in tois Volume 42, Issue 4

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)165
      • Downloads (Last 6 weeks)91

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text