skip to main content
research-article

Unsupervised Multimodal Machine Translation for Low-resource Distant Language Pairs

Authors Info & Claims
Published:15 April 2024Publication History
Skip Abstract Section

Abstract

Unsupervised machine translation (UMT) has recently attracted more attention from researchers, enabling models to translate when languages lack parallel corpora. However, the current works mainly consider close language pairs (e.g., English-German and English-French), and the effectiveness of visual content for distant language pairs has yet to be investigated. This article proposes an unsupervised multimodal machine translation model for low-resource distant language pairs. Specifically, we first employ adequate measures such as transliteration and re-ordering to bring distant language pairs closer together. We then use visual content to extend masked language modeling and generate visual masked language modeling for UMT. Finally, empirical experiments are conducted on our distant language pair dataset and the public Multi30k dataset. Experimental results demonstrate the superior performance of our model, with BLEU score improvements of 2.5 and 2.6 on translation for distant language pairs English-Uyghur and Chinese-Uyghur. Moreover, our model also brings remarkable results for close language pairs, improving 2.3 BLEU compared with the existing models in English-German.

REFERENCES

  1. [1] Artetxe Mikel, Labaka Gorka, Agirre Eneko, and Cho Kyunghyun. 2018. Unsupervised neural machine translation. In Proceedings of the 6th International Conference on Learning Representations. 112.Google ScholarGoogle Scholar
  2. [2] Bahdanau Dzmitry, Cho Kyunghyun, and Bengio Yoshua. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations. 115.Google ScholarGoogle Scholar
  3. [3] Caglayan Ozan, García-Martínez Mercedes, Bardet Adrien, Aransa Walid, Bougares Fethi, and Barrault Loïc. 2017. NMTPY: A flexible toolkit for advanced neural machine translation systems. Prague Bull. Math. Linguistics 109 (2017), 1528.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Caglayan Ozan, Kuyu Menekse, Amac Mustafa Sercan, Madhyastha Pranava, Erdem Erkut, Erdem Aykut, and Specia Lucia. 2021. Cross-lingual visual pre-training for multimodal machine translation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. 13171324.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Chang Pi-Chuan, Galley Michel, and Manning Christopher D.. 2008. Optimizing chinese word segmentation for machine translation performance. In Proceedings of the 3rd Workshop on Statistical Machine Translation. 224232.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Chen Shizhe, Jin Qin, and Fu Jianlong. 2019. From words to sentences: A progressive learning approach for zero-resource machine translation with visual pivots. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 49324938.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Chen Yun, Liu Yang, and Li Victor O. K.. 2018. Zero-resource neural machine translation with multi-agent communication game. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18), the 30th Innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence. 50865093.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Cheng Yong, Yang Qian, Liu Yang, Sun Maosong, and Xu Wei. 2017. Joint training for pivot-based neural machine translation. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 39743980.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Cho Kyunghyun, Merrienboer Bart van, Bahdanau Dzmitry, and Bengio Yoshua. 2014. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of the SSST@EMNLP 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. 103111.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Cho Kyunghyun, Merrienboer Bart Van, Gulcehre Caglar, Bahdanau Dzmitry, Bougares Fethi, Schwenk Holger, and Bengio Yoshua. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 17241734.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Conneau Alexis and Lample Guillaume. 2019. Cross-lingual language model pretraining. In Proceedings of the 32nd Annual Conference on Neural Information Processing Systems. 70577067.Google ScholarGoogle Scholar
  12. [12] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 41714186.Google ScholarGoogle Scholar
  13. [13] Devlin Jacob, Chang Ming Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. 41714186.Google ScholarGoogle Scholar
  14. [14] Elliott Desmond, Frank Stella, Sima’an Khalil, and Specia Lucia. 2016. Multi30K: Multilingual English-German image descriptions. In Proceedings of the 5th Workshop on Vision and Language. 7074.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Firat Orhan, Sankaran Baskaran, Al-Onaizan Yaser, Yarman-Vural Fatos T., and Cho Kyunghyun. 2016. Zero-resource translation with multi-lingual neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 268277.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Frank Stella, Elliott Desmond, and Specia Lucia. 2018. Assessing multilingual multimodal image description: Studies of native speaker preferences and translator choices. Nat. Lang. Eng. 24, 3 (2018), 393413.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Helcl Jindrich, Libovický Jindrich, and Varis Dusan. 2018. CUNI system for the WMT18 multimodal translation task. In Proceedings of the 3rd Conference on Machine Translation. 616623.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Hoang Cong Duy Vu, Koehn Philipp, Haffari Gholamreza, and Cohn Trevor. 2018. Iterative back-translation for neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. 1824.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Howard Jeremy and Ruder Sebastian. 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 328339.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Huang Po-Yao, Hu Junjie, Chang Xiaojun, and Hauptmann Alexander G.. 2020. Unsupervised multimodal neural machine translation with pseudo visual pivoting. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 82268237.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Huang Ping, Sun Shiliang, and Yang Hao. 2021. Image-assisted transformer in zero-resource multi-modal translation. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. 75487552.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Isozaki Hideki, Hirao Tsutomu, Duh Kevin, Sudoh Katsuhito, and Tsukada Hajime. 2010. Automatic evaluation of translation quality for distant language pairs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 944952.Google ScholarGoogle Scholar
  23. [23] Isozaki Hideki, Sudoh Katsuhito, Tsukada Hajime, and Duh Kevin. 2010. Head finalization: A simple reordering rule for SOV languages. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR. 244251.Google ScholarGoogle Scholar
  24. [24] Kim Yunsu, Graça Miguel, and Ney Hermann. 2020. When and why is unsupervised neural machine translation useless?. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation. 3544.Google ScholarGoogle Scholar
  25. [25] Kingma Diederik P. and Ba Jimmy. 2015. Adam: A method for stochastic optimization. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 115.Google ScholarGoogle Scholar
  26. [26] Koehn Philipp. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04), A meeting of SIGDAT, a Special Interest Group of the ACL, Held in Conjunction with ACL. 388395.Google ScholarGoogle Scholar
  27. [27] Koehn Philipp, Hoang Hieu, Birch Alexandra, Callison-Burch Chris, Federico Marcello, Bertoldi Nicola, Cowan Brooke, Shen Wade, Moran Christine, Zens Richard, Dyer Chris, Bojar Ondrej, Constantin Alexandra, and Herbst Evan. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 177180.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Kuznetsova Alina, Rom Hassan, Alldrin Neil, Uijlings Jasper, Krasin Ivan, Pont-Tuset Jordi, Kamali Shahab, Popov Stefan, Malloci Matteo, Kolesnikov Alexander et al. 2020. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. Int. J. Comput. Vision 128, 7 (2020), 19561981.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Lample Guillaume, Conneau Alexis, Denoyer Ludovic, and Ranzato Marc’Aurelio. 2018. Unsupervised machine translation using monolingual corpora only. In Proceedings of the 6th International Conference on Learning Representations. 114.Google ScholarGoogle Scholar
  30. [30] Lavie Alon and Agarwal Abhaya. 2007. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the 2nd Workshop on Statistical Machine Translation. 228231.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Li Lin, Hu Kaixi, Tayir Turghun, Liu Jianquan, and Lee Kong Aik. 2022. Noise-robust semi-supervised multi-modal machine translation. In Proceedings of the 19th Pacific Rim International Conference on Artificial Intelligence. 155168.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Li Lin, Tayir Turghun, Han Yifeng, Tao Xiaohui, and Velásquez Juan D.. 2023. Multimodality information fusion for automated machine translation. Info. Fusion 91 (2023), 352363.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Li Lin, Tayir Turghun, Hu Kaixi, and Zhou Dong. 2021. Multi-modal and multi-perspective machine translation by collecting diverse alignments. In Proceedings of the 18th Pacific Rim International Conference on Artificial Intelligence. 311322.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Li Mingjie, Huang Po-Yao, Chang Xiaojun, Hu Junjie, Yang Yi, and Hauptmann Alex. 2023. Video pivoting unsupervised multi-modal machine translation. IEEE Trans. Pattern Anal. Mach. Intell. 45, 3 (2023), 39183932.Google ScholarGoogle Scholar
  35. [35] Marchisio Kelly, Duh Kevin, and Koehn Philipp. 2020. When does unsupervised machine translation work? In Proceedings of the 5th Conference on Machine Translation. 571583.Google ScholarGoogle Scholar
  36. [36] Nakayama Hideki and Nishida Noriki. 2017. Zero-resource machine translation by multimodal encoder-decoder network with multimedia pivot. Mach. Transl. 31, 1-2 (2017), 4964.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Neubig Graham, Dou Zi-Yi, Hu Junjie, Michel Paul, Pruthi Danish, and Wang Xinyi. 2019. compare-mt: A tool for holistic comparison of language generation systems. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 3541.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Papineni Kishore, Roukos Salim, Ward Todd, and Zhu Wei-Jing. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311318.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Ren Shaoqing, He Kaiming, Girshick Ross B., and Sun Jian. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Conference on Neural Information Processing Systems. 9199.Google ScholarGoogle Scholar
  40. [40] Sennrich Rico, Haddow Barry, and Birch Alexandra. 2016. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 8696.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Sennrich Rico, Haddow Barry, and Birch Alexandra. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 17151725.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Snover Matthew, Dorr Bonnie, Schwartz Richard, Micciulla Linnea, and Makhoul John. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the Association for Machine Translation in the Americas. 223231.Google ScholarGoogle Scholar
  43. [43] Song Kaitao, Tan Xu, Qin Tao, Lu Jianfeng, and Liu Tie-Yan. 2019. MASS: Masked sequence to sequence pre-training for language generation. In Proceedings of the 36th International Conference on Machine Learning. 59265936.Google ScholarGoogle Scholar
  44. [44] Song Linfeng, Gildea Daniel, Zhang Yue, Wang Zhiguo, and Su Jinsong. 2019. Semantic neural machine translation using AMR. Trans. Assoc. Comput. Linguistics 7 (2019), 1931.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Su Yuanhang, Fan Kai, Bach Nguyen, Kuo C.-C. Jay, and Huang Fei. 2019. Unsupervised multi-modal neural machine translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1048210491.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Sun Haipeng, Wang Rui, Utiyama Masao, Marie Benjamin, Chen Kehai, Sumita Eiichiro, and Zhao Tiejun. 2021. Unsupervised neural machine translation for similar and distant language pairs: An empirical study. ACM Trans. Asian Low Resour. Lang. Inf. Process. 20, 1 (2021), 10:1–10:17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Sutskever Ilya, Vinyals Oriol, and Le Quoc V.. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems. 31043112.Google ScholarGoogle Scholar
  48. [48] Tayir Turghun, Li Lin, Li Bei, Liu Jianquan, and Lee Kong Aik. 2024. Encoder-decoder calibration for multimodal machine translation. IEEE Trans. Artific. Intell. (2024), 19. plore.ieee.org/document/10401981Google ScholarGoogle Scholar
  49. [49] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the 30th Annual Conference on Neural Information Processing Systems. 59986008.Google ScholarGoogle Scholar
  50. [50] Vincent Pascal, Larochelle Hugo, Bengio Yoshua, and Manzagol Pierre-Antoine. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning. 10961103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Wang Yijun, Wei Tianxin, Liu Qi, and Chen Enhong. 2021. Unpaired multimodal neural machine translation via reinforcement learning. In Proceedings of the 26th International Conference on Database Systems for Advanced Applications. 168185.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Yang Zhe, Fang Qingkai, and Feng Yang. 2022. Low-resource neural machine translation with cross-modal alignment. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1013410146.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Young Peter, Lai Alice, Hodosh Micah, and Hockenmaier Julia. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguistics 2 (2014), 6778.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Yu Kun, Miyao Yusuke, Wang Xiangli, Matsuzaki Takuya, and Tsujii Jun’ichi. 2010. Semi-automatically developing Chinese HPSG grammar from the Penn Chinese treebank for deep parsing. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). 14171425.Google ScholarGoogle Scholar
  55. [55] Zhang Zhirui, Liu Shujie, Li Mu, Zhou Ming, and Chen Enhong. 2018. Joint training for neural machine translation models with monolingual data. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18), the 30th Innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’18). 555562.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Zhou Chunting, Ma Xuezhe, Hu Junjie, and Neubig Graham. [n.d.]. Handling syntactic divergence in low-resource machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 13881394.Google ScholarGoogle Scholar
  57. [57] Zoph Barret, Yuret Deniz, May Jonathan, and Knight Kevin. 2016. Transfer learning for low-resource neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 15681575.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Unsupervised Multimodal Machine Translation for Low-resource Distant Language Pairs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 4
      April 2024
      221 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3613577
      • Editor:
      • Imed Zitouni
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 April 2024
      • Online AM: 9 March 2024
      • Accepted: 5 March 2024
      • Revised: 26 February 2024
      • Received: 7 November 2023
      Published in tallip Volume 23, Issue 4

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)287
      • Downloads (Last 6 weeks)222

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text