Exploiting multiple correlated modalities can enhance low-resource machine translation quality

被引:0
|
作者
Loitongbam Sanayai Meetei
Thoudam Doren Singh
Sivaji Bandyopadhyay
机构
[1] National Institute of Technology Silchar,Center for Natural Language Processing (CNLP)
[2] National Institute of Technology Silchar,Department of Computer Science and Engineering
[3] Jadavpur University,Department of Computer Science and Engineering
来源
关键词
Multimodal machine translation; Image-guided; Speech-guided; Low-resource; Manipuri; Hindi; Neural machine translation;
D O I
暂无
中图分类号
学科分类号
摘要
In an effort to enhance the machine translation (MT) quality of low-resource languages, we report the first study on multimodal machine translation (MMT) for Manipuri→\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document}English, Manipuri→\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document}Hindi and Manipuri→\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document}German language pairs. Manipuri is a morphologically rich and resource-constrained language with limited resources that can be computationally utilized. No such MMT dataset has not been reported for these language pairs till date. To build the parallel datasets, we collected news articles containing images and associated text in English from a local daily newspaper and used English as a pivot language. The machine-translated outputs of the existing translation systems of these languages go through manual post-editing to build the datasets. In addition to text, we build MT systems by exploiting features from images and audio recordings in the source language, i.e., Manipuri. We carried out an extensive analysis of the MT systems trained with text-only and multimodal inputs using automatic metrics and human evaluation techniques. Our findings attest that integrating multiple correlated modalities enhances the MT system performance in low-resource settings achieving a significant improvement of up to +3 BLEU score. The human assessment revealed that the fluency score of the MMT systems depends on the type of correlated auxiliary modality.
引用
收藏
页码:13137 / 13157
页数:20
相关论文
共 50 条
  • [31] Low-Resource Machine Translation with Different Granularity Image Features
    Tayir, Turghun
    Li, Lin
    Maimaiti, Mieradilijiang
    Muhtar, Yusnur
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 260 - 273
  • [32] Low-Resource Neural Machine Translation: A Systematic Literature Review
    Yazar, Bilge Kagan
    Sahin, Durmus Ozkan
    Kilic, Erdal
    IEEE ACCESS, 2023, 11 : 131775 - 131813
  • [33] Meta-Learning for Low-Resource Neural Machine Translation
    Gu, Jiatao
    Wang, Yong
    Chen, Yun
    Cho, Kyunghyun
    Li, Victor O. K.
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3622 - 3631
  • [34] AAVE Corpus Generation and Low-Resource Dialect Machine Translation
    Graves, Eric
    Aswar, Shreyas
    Desai, Rujuta
    Nampelli, Srilekha
    Chakraborty, Sunandan
    Hall, Ted
    PROCEEDINGS OF THE ACM SIGCAS/SIGCHI CONFERENCE ON COMPUTING AND SUSTAINABLE SOCIETIES 2024, COMPASS 2024, 2024, : 50 - 59
  • [35] Boosting the Transformer with the BERT Supervision in Low-Resource Machine Translation
    Yan, Rong
    Li, Jiang
    Su, Xiangdong
    Wang, Xiaoming
    Gao, Guanglai
    APPLIED SCIENCES-BASEL, 2022, 12 (14):
  • [36] Neural Machine Translation of Low-Resource and Similar Languages with Backtranslation
    Przystupa, Michael
    Abdul-Mageed, Muhammad
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 224 - 235
  • [37] Extremely low-resource neural machine translation for Asian languages
    Rubino, Raphael
    Marie, Benjamin
    Dabre, Raj
    Fujita, Atushi
    Utiyama, Masao
    Sumita, Eiichiro
    MACHINE TRANSLATION, 2020, 34 (04) : 347 - 382
  • [38] Introduction to the Special Issue on Machine Translation for Low-Resource Languages
    Liu, Chao-Hong
    Karakanta, Alina
    Tong, Audrey N.
    Aulov, Oleg
    Soboroff, Ian M.
    Washington, Jonathan
    Zhao, Xiaobing
    MACHINE TRANSLATION, 2020, 34 (04) : 247 - 249
  • [39] Revisiting Low-Resource Neural Machine Translation: A Case Study
    Sennrich, Rico
    Zhang, Biao
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 211 - 221
  • [40] Improve example-based machine translation quality for low-resource language using ontology
    Khan Md Anwarus K.M.A.
    Yamada S.
    Tetsuro N.
    International Journal of Networked and Distributed Computing, 2017, 5 (3) : 176 - 191