Exploiting multiple correlated modalities can enhance low-resource machine translation quality

被引:0
|
作者
Loitongbam Sanayai Meetei
Thoudam Doren Singh
Sivaji Bandyopadhyay
机构
[1] National Institute of Technology Silchar,Center for Natural Language Processing (CNLP)
[2] National Institute of Technology Silchar,Department of Computer Science and Engineering
[3] Jadavpur University,Department of Computer Science and Engineering
来源
关键词
Multimodal machine translation; Image-guided; Speech-guided; Low-resource; Manipuri; Hindi; Neural machine translation;
D O I
暂无
中图分类号
学科分类号
摘要
In an effort to enhance the machine translation (MT) quality of low-resource languages, we report the first study on multimodal machine translation (MMT) for Manipuri→\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document}English, Manipuri→\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document}Hindi and Manipuri→\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document}German language pairs. Manipuri is a morphologically rich and resource-constrained language with limited resources that can be computationally utilized. No such MMT dataset has not been reported for these language pairs till date. To build the parallel datasets, we collected news articles containing images and associated text in English from a local daily newspaper and used English as a pivot language. The machine-translated outputs of the existing translation systems of these languages go through manual post-editing to build the datasets. In addition to text, we build MT systems by exploiting features from images and audio recordings in the source language, i.e., Manipuri. We carried out an extensive analysis of the MT systems trained with text-only and multimodal inputs using automatic metrics and human evaluation techniques. Our findings attest that integrating multiple correlated modalities enhances the MT system performance in low-resource settings achieving a significant improvement of up to +3 BLEU score. The human assessment revealed that the fluency score of the MMT systems depends on the type of correlated auxiliary modality.
引用
收藏
页码:13137 / 13157
页数:20
相关论文
共 50 条
  • [1] Exploiting multiple correlated modalities can enhance low-resource machine translation quality
    Meetei, Loitongbam Sanayai
    Singh, Thoudam Doren
    Bandyopadhyay, Sivaji
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (05) : 13137 - 13157
  • [2] Survey of Low-Resource Machine Translation
    Haddow, Barry
    Bawden, Rachel
    Barone, Antonio Valerio Miceli
    Helcl, Jindrich
    Birch, Alexandra
    COMPUTATIONAL LINGUISTICS, 2022, 48 (03) : 673 - 732
  • [3] Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?
    Fourrier, Clementine
    Bawden, Rachel
    Sagot, Benoit
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 847 - 861
  • [4] Translation Memories as Baselines for Low-Resource Machine Translation
    Knowles, Rebecca
    Littell, Patrick
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6759 - 6767
  • [5] Simulated Multiple Reference Training Improves Low-Resource Machine Translation
    Khayrallah, Huda
    Thompson, Brian
    Post, Matt
    Koehn, Philipp
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 82 - 89
  • [6] Efficient Neural Machine Translation for Low-Resource Languages via Exploiting Related Languages
    Goyal, Vikrant
    Kumar, Sourav
    Sharma, Dipti Misra
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): STUDENT RESEARCH WORKSHOP, 2020, : 162 - 168
  • [7] Machine Translation into Low-resource Language Varieties
    Kumar, Sachin
    Anastasopoulos, Antonios
    Wintner, Shuly
    Tsvetkov, Yulia
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 110 - 121
  • [8] A Survey on Low-Resource Neural Machine Translation
    Wang, Rui
    Tan, Xu
    Luo, Renqian
    Qin, Tao
    Liu, Tie-Yan
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4636 - 4643
  • [9] Transformers for Low-resource Neural Machine Translation
    Gezmu, Andargachew Mekonnen
    Nuernberger, Andreas
    ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2022, : 459 - 466
  • [10] A Survey on Low-resource Neural Machine Translation
    Li H.-Z.
    Feng C.
    Huang H.-Y.
    Huang, He-Yan (hhy63@bit.edu.cn), 1600, Science Press (47): : 1217 - 1231