research-article

Unsupervised Multimodal Machine Translation for Low-resource Distant Language Pairs

Authors:
Turghun Tayir

School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China

School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China

0000-0001-6200-171X
View Profile

,
Lin Li

School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China

School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan, China

0000-0001-7553-6916
View Profile

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 23 Issue 4Article No.: 55pp 1–22https://doi.org/10.1145/3652161

Published:15 April 2024Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Unsupervised machine translation (UMT) has recently attracted more attention from researchers, enabling models to translate when languages lack parallel corpora. However, the current works mainly consider close language pairs (e.g., English-German and English-French), and the effectiveness of visual content for distant language pairs has yet to be investigated. This article proposes an unsupervised multimodal machine translation model for low-resource distant language pairs. Specifically, we first employ adequate measures such as transliteration and re-ordering to bring distant language pairs closer together. We then use visual content to extend masked language modeling and generate visual masked language modeling for UMT. Finally, empirical experiments are conducted on our distant language pair dataset and the public Multi30k dataset. Experimental results demonstrate the superior performance of our model, with BLEU score improvements of 2.5 and 2.6 on translation for distant language pairs English-Uyghur and Chinese-Uyghur. Moreover, our model also brings remarkable results for close language pairs, improving 2.3 BLEU compared with the existing models in English-German.

REFERENCES

[1] Artetxe Mikel, Labaka Gorka, Agirre Eneko, and Cho Kyunghyun. 2018. Unsupervised neural machine translation. In Proceedings of the 6th International Conference on Learning Representations. 1–12.Google Scholar
[2] Bahdanau Dzmitry, Cho Kyunghyun, and Bengio Yoshua. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations. 1–15.Google Scholar
[3] Caglayan Ozan, García-Martínez Mercedes, Bardet Adrien, Aransa Walid, Bougares Fethi, and Barrault Loïc. 2017. NMTPY: A flexible toolkit for advanced neural machine translation systems. Prague Bull. Math. Linguistics 109 (2017), 15–28.Google ScholarCross Ref
[4] Caglayan Ozan, Kuyu Menekse, Amac Mustafa Sercan, Madhyastha Pranava, Erdem Erkut, Erdem Aykut, and Specia Lucia. 2021. Cross-lingual visual pre-training for multimodal machine translation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. 1317–1324.Google ScholarCross Ref
[5] Chang Pi-Chuan, Galley Michel, and Manning Christopher D.. 2008. Optimizing chinese word segmentation for machine translation performance. In Proceedings of the 3rd Workshop on Statistical Machine Translation. 224–232.Google ScholarDigital Library
[6] Chen Shizhe, Jin Qin, and Fu Jianlong. 2019. From words to sentences: A progressive learning approach for zero-resource machine translation with visual pivots. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 4932–4938.Google ScholarCross Ref
[7] Chen Yun, Liu Yang, and Li Victor O. K.. 2018. Zero-resource neural machine translation with multi-agent communication game. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18), the 30th Innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence. 5086–5093.Google ScholarCross Ref
[8] Cheng Yong, Yang Qian, Liu Yang, Sun Maosong, and Xu Wei. 2017. Joint training for pivot-based neural machine translation. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. 3974–3980.Google ScholarDigital Library
[9] Cho Kyunghyun, Merrienboer Bart van, Bahdanau Dzmitry, and Bengio Yoshua. 2014. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of the SSST@EMNLP 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. 103–111.Google ScholarCross Ref
[10] Cho Kyunghyun, Merrienboer Bart Van, Gulcehre Caglar, Bahdanau Dzmitry, Bougares Fethi, Schwenk Holger, and Bengio Yoshua. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1724–1734.Google ScholarCross Ref
[11] Conneau Alexis and Lample Guillaume. 2019. Cross-lingual language model pretraining. In Proceedings of the 32nd Annual Conference on Neural Information Processing Systems. 7057–7067.Google Scholar
[12] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186.Google Scholar
[13] Devlin Jacob, Chang Ming Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. 4171–4186.Google Scholar
[14] Elliott Desmond, Frank Stella, Sima’an Khalil, and Specia Lucia. 2016. Multi30K: Multilingual English-German image descriptions. In Proceedings of the 5th Workshop on Vision and Language. 70–74.Google ScholarCross Ref
[15] Firat Orhan, Sankaran Baskaran, Al-Onaizan Yaser, Yarman-Vural Fatos T., and Cho Kyunghyun. 2016. Zero-resource translation with multi-lingual neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 268–277.Google ScholarCross Ref
[16] Frank Stella, Elliott Desmond, and Specia Lucia. 2018. Assessing multilingual multimodal image description: Studies of native speaker preferences and translator choices. Nat. Lang. Eng. 24, 3 (2018), 393–413.Google ScholarCross Ref
[17] Helcl Jindrich, Libovický Jindrich, and Varis Dusan. 2018. CUNI system for the WMT18 multimodal translation task. In Proceedings of the 3rd Conference on Machine Translation. 616–623.Google ScholarCross Ref
[18] Hoang Cong Duy Vu, Koehn Philipp, Haffari Gholamreza, and Cohn Trevor. 2018. Iterative back-translation for neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. 18–24.Google ScholarCross Ref
[19] Howard Jeremy and Ruder Sebastian. 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 328–339.Google ScholarCross Ref
[20] Huang Po-Yao, Hu Junjie, Chang Xiaojun, and Hauptmann Alexander G.. 2020. Unsupervised multimodal neural machine translation with pseudo visual pivoting. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 8226–8237.Google ScholarCross Ref
[21] Huang Ping, Sun Shiliang, and Yang Hao. 2021. Image-assisted transformer in zero-resource multi-modal translation. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. 7548–7552.Google ScholarCross Ref
[22] Isozaki Hideki, Hirao Tsutomu, Duh Kevin, Sudoh Katsuhito, and Tsukada Hajime. 2010. Automatic evaluation of translation quality for distant language pairs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 944–952.Google Scholar
[23] Isozaki Hideki, Sudoh Katsuhito, Tsukada Hajime, and Duh Kevin. 2010. Head finalization: A simple reordering rule for SOV languages. In Proceedings of the Joint 5th Workshop on Statistical Machine Translation and MetricsMATR. 244–251.Google Scholar
[24] Kim Yunsu, Graça Miguel, and Ney Hermann. 2020. When and why is unsupervised neural machine translation useless?. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation. 35–44.Google Scholar
[25] Kingma Diederik P. and Ba Jimmy. 2015. Adam: A method for stochastic optimization. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1–15.Google Scholar
[26] Koehn Philipp. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’04), A meeting of SIGDAT, a Special Interest Group of the ACL, Held in Conjunction with ACL. 388–395.Google Scholar
[27] Koehn Philipp, Hoang Hieu, Birch Alexandra, Callison-Burch Chris, Federico Marcello, Bertoldi Nicola, Cowan Brooke, Shen Wade, Moran Christine, Zens Richard, Dyer Chris, Bojar Ondrej, Constantin Alexandra, and Herbst Evan. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 177–180.Google ScholarDigital Library
[28] Kuznetsova Alina, Rom Hassan, Alldrin Neil, Uijlings Jasper, Krasin Ivan, Pont-Tuset Jordi, Kamali Shahab, Popov Stefan, Malloci Matteo, Kolesnikov Alexander et al. 2020. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. Int. J. Comput. Vision 128, 7 (2020), 1956–1981.Google ScholarCross Ref
[29] Lample Guillaume, Conneau Alexis, Denoyer Ludovic, and Ranzato Marc’Aurelio. 2018. Unsupervised machine translation using monolingual corpora only. In Proceedings of the 6th International Conference on Learning Representations. 1–14.Google Scholar
[30] Lavie Alon and Agarwal Abhaya. 2007. METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the 2nd Workshop on Statistical Machine Translation. 228–231.Google ScholarCross Ref
[31] Li Lin, Hu Kaixi, Tayir Turghun, Liu Jianquan, and Lee Kong Aik. 2022. Noise-robust semi-supervised multi-modal machine translation. In Proceedings of the 19th Pacific Rim International Conference on Artificial Intelligence. 155–168.Google ScholarDigital Library
[32] Li Lin, Tayir Turghun, Han Yifeng, Tao Xiaohui, and Velásquez Juan D.. 2023. Multimodality information fusion for automated machine translation. Info. Fusion 91 (2023), 352–363.Google ScholarDigital Library
[33] Li Lin, Tayir Turghun, Hu Kaixi, and Zhou Dong. 2021. Multi-modal and multi-perspective machine translation by collecting diverse alignments. In Proceedings of the 18th Pacific Rim International Conference on Artificial Intelligence. 311–322.Google ScholarDigital Library
[34] Li Mingjie, Huang Po-Yao, Chang Xiaojun, Hu Junjie, Yang Yi, and Hauptmann Alex. 2023. Video pivoting unsupervised multi-modal machine translation. IEEE Trans. Pattern Anal. Mach. Intell. 45, 3 (2023), 3918–3932.Google Scholar
[35] Marchisio Kelly, Duh Kevin, and Koehn Philipp. 2020. When does unsupervised machine translation work? In Proceedings of the 5th Conference on Machine Translation. 571–583.Google Scholar
[36] Nakayama Hideki and Nishida Noriki. 2017. Zero-resource machine translation by multimodal encoder-decoder network with multimedia pivot. Mach. Transl. 31, 1-2 (2017), 49–64.Google ScholarDigital Library
[37] Neubig Graham, Dou Zi-Yi, Hu Junjie, Michel Paul, Pruthi Danish, and Wang Xinyi. 2019. compare-mt: A tool for holistic comparison of language generation systems. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19). 35–41.Google ScholarCross Ref
[38] Papineni Kishore, Roukos Salim, Ward Todd, and Zhu Wei-Jing. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311–318.Google ScholarDigital Library
[39] Ren Shaoqing, He Kaiming, Girshick Ross B., and Sun Jian. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Conference on Neural Information Processing Systems. 91–99.Google Scholar
[40] Sennrich Rico, Haddow Barry, and Birch Alexandra. 2016. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 86–96.Google ScholarCross Ref
[41] Sennrich Rico, Haddow Barry, and Birch Alexandra. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 1715–1725.Google ScholarCross Ref
[42] Snover Matthew, Dorr Bonnie, Schwartz Richard, Micciulla Linnea, and Makhoul John. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the Association for Machine Translation in the Americas. 223–231.Google Scholar
[43] Song Kaitao, Tan Xu, Qin Tao, Lu Jianfeng, and Liu Tie-Yan. 2019. MASS: Masked sequence to sequence pre-training for language generation. In Proceedings of the 36th International Conference on Machine Learning. 5926–5936.Google Scholar
[44] Song Linfeng, Gildea Daniel, Zhang Yue, Wang Zhiguo, and Su Jinsong. 2019. Semantic neural machine translation using AMR. Trans. Assoc. Comput. Linguistics 7 (2019), 19–31.Google ScholarCross Ref
[45] Su Yuanhang, Fan Kai, Bach Nguyen, Kuo C.-C. Jay, and Huang Fei. 2019. Unsupervised multi-modal neural machine translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10482–10491.Google ScholarCross Ref
[46] Sun Haipeng, Wang Rui, Utiyama Masao, Marie Benjamin, Chen Kehai, Sumita Eiichiro, and Zhao Tiejun. 2021. Unsupervised neural machine translation for similar and distant language pairs: An empirical study. ACM Trans. Asian Low Resour. Lang. Inf. Process. 20, 1 (2021), 10:1–10:17.Google ScholarDigital Library
[47] Sutskever Ilya, Vinyals Oriol, and Le Quoc V.. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems. 3104–3112.Google Scholar
[48] Tayir Turghun, Li Lin, Li Bei, Liu Jianquan, and Lee Kong Aik. 2024. Encoder-decoder calibration for multimodal machine translation. IEEE Trans. Artific. Intell. (2024), 1–9. plore.ieee.org/document/10401981Google Scholar
[49] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the 30th Annual Conference on Neural Information Processing Systems. 5998–6008.Google Scholar
[50] Vincent Pascal, Larochelle Hugo, Bengio Yoshua, and Manzagol Pierre-Antoine. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning. 1096–1103.Google ScholarDigital Library
[51] Wang Yijun, Wei Tianxin, Liu Qi, and Chen Enhong. 2021. Unpaired multimodal neural machine translation via reinforcement learning. In Proceedings of the 26th International Conference on Database Systems for Advanced Applications. 168–185.Google ScholarDigital Library
[52] Yang Zhe, Fang Qingkai, and Feng Yang. 2022. Low-resource neural machine translation with cross-modal alignment. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 10134–10146.Google ScholarCross Ref
[53] Young Peter, Lai Alice, Hodosh Micah, and Hockenmaier Julia. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguistics 2 (2014), 67–78.Google ScholarCross Ref
[54] Yu Kun, Miyao Yusuke, Wang Xiangli, Matsuzaki Takuya, and Tsujii Jun’ichi. 2010. Semi-automatically developing Chinese HPSG grammar from the Penn Chinese treebank for deep parsing. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING’10). 1417–1425.Google Scholar
[55] Zhang Zhirui, Liu Shujie, Li Mu, Zhou Ming, and Chen Enhong. 2018. Joint training for neural machine translation models with monolingual data. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18), the 30th Innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’18). 555–562.Google ScholarCross Ref
[56] Zhou Chunting, Ma Xuezhe, Hu Junjie, and Neubig Graham. [n.d.]. Handling syntactic divergence in low-resource machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 1388–1394.Google Scholar
[57] Zoph Barret, Yuret Deniz, May Jonathan, and Knight Kevin. 2016. Transfer learning for low-resource neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1568–1575.Google ScholarCross Ref

Index Terms

Unsupervised Multimodal Machine Translation for Low-resource Distant Language Pairs
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Machine translation

Recommendations

Low resource machine translation of english–manipuri: A semi-supervised approach
Abstract
The language barrier is one of the practical challenges human being face during communication. To overcome this, researchers are focusing on using machines to translate a source language to a target language using the textual ...
Highlights
- Backtranslation and forward-translation improve the low resource machine translation.
Read More
Unsupervised Neural Machine Translation for Similar and Distant Language Pairs: An Empirical Study
Special issue on Deep Learning for Low-Resource Natural Language Processing, Part 1 and Regular Papers

Unsupervised neural machine translation (UNMT) has achieved remarkable results for several language pairs, such as French–English and German–English. Most previous studies have focused on modeling UNMT systems; few studies have investigated the effect ...
Read More
Source language adaptation approaches for resource-poor machine translation

Most of the world languages are resource-poor for statistical machine translation; still, many of them are actually related to some resource-rich language. Thus, we propose three novel, language-independent approaches to source language adaptation for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Asian and Low-Resource Language Information Processing Volume 23, Issue 4
April 2024
221 pages
ISSN:2375-4699
EISSN:2375-4702
DOI:10.1145/3613577
Editor:
Imed Zitouni
Google, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 April 2024
- Online AM: 9 March 2024
- Accepted: 5 March 2024
- Revised: 26 February 2024
- Received: 7 November 2023
Published in tallip Volume 23, Issue 4

Check for updates
Author Tags
Visual masked language modeling
unsupervised machine translation
distant language pair
image feature
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 287
  Total Downloads
- Downloads (Last 12 months)287
- Downloads (Last 6 weeks)222
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Unsupervised Multimodal Machine Translation for Low-resource Distant Language Pairs

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Low resource machine translation of english–manipuri: A semi-supervised approach

Unsupervised Neural Machine Translation for Similar and Distant Language Pairs: An Empirical Study

Source language adaptation approaches for resource-poor machine translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

Caption

Unsupervised Multimodal Machine Translation for Low-resource Distant Language Pairs

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Low resource machine translation of english–manipuri: A semi-supervised approach

Unsupervised Neural Machine Translation for Similar and Distant Language Pairs: An Empirical Study

Source language adaptation approaches for resource-poor machine translation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

Share this Publication link

Share on Social Media