Exploiting multiple correlated modalities can enhance low-resource machine translation quality

被引:0
|
作者
Loitongbam Sanayai Meetei
Thoudam Doren Singh
Sivaji Bandyopadhyay
机构
[1] National Institute of Technology Silchar,Center for Natural Language Processing (CNLP)
[2] National Institute of Technology Silchar,Department of Computer Science and Engineering
[3] Jadavpur University,Department of Computer Science and Engineering
来源
Multimedia Tools and Applications | 2024年 / 83卷
关键词
Multimodal machine translation; Image-guided; Speech-guided; Low-resource; Manipuri; Hindi; Neural machine translation;
D O I
暂无
中图分类号
学科分类号
摘要
In an effort to enhance the machine translation (MT) quality of low-resource languages, we report the first study on multimodal machine translation (MMT) for Manipuri→\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document}English, Manipuri→\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document}Hindi and Manipuri→\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document}German language pairs. Manipuri is a morphologically rich and resource-constrained language with limited resources that can be computationally utilized. No such MMT dataset has not been reported for these language pairs till date. To build the parallel datasets, we collected news articles containing images and associated text in English from a local daily newspaper and used English as a pivot language. The machine-translated outputs of the existing translation systems of these languages go through manual post-editing to build the datasets. In addition to text, we build MT systems by exploiting features from images and audio recordings in the source language, i.e., Manipuri. We carried out an extensive analysis of the MT systems trained with text-only and multimodal inputs using automatic metrics and human evaluation techniques. Our findings attest that integrating multiple correlated modalities enhances the MT system performance in low-resource settings achieving a significant improvement of up to +3 BLEU score. The human assessment revealed that the fluency score of the MMT systems depends on the type of correlated auxiliary modality.
引用
收藏
页码:13137 / 13157
页数:20
相关论文
共 50 条
  • [41] Improve Example-Based Machine Translation Quality for Low-Resource Language Using Ontology
    Salam, Khan Md Anwarus
    Yamada, Setsuo
    Tetsuro, Nishio
    APPLIED COMPUTING & INFORMATION TECHNOLOGY, 2018, 727 : 67 - 90
  • [42] Rethinking the Exploitation of Monolingual Data for Low-Resource Neural Machine Translation
    Pang, Jianhui
    Yang, Baosong
    Wong, Derek Fai
    Wan, Yu
    Liu, Dayiheng
    Chao, Lidia Sam
    Xie, Jun
    COMPUTATIONAL LINGUISTICS, 2023, 50 (01) : 25 - 47
  • [43] A Diverse Data Augmentation Strategy for Low-Resource Neural Machine Translation
    Li, Yu
    Li, Xiao
    Yang, Yating
    Dong, Rui
    INFORMATION, 2020, 11 (05)
  • [44] Semantic Perception-Oriented Low-Resource Neural Machine Translation
    Wu, Nier
    Hou, Hongxu
    Li, Haoran
    Chang, Xin
    Jia, Xiaoning
    MACHINE TRANSLATION, CCMT 2021, 2021, 1464 : 51 - 62
  • [45] Revisiting Back-Translation for Low-Resource Machine Translation Between Chinese and Vietnamese
    Li, Hongzheng
    Sha, Jiu
    Shi, Can
    IEEE ACCESS, 2020, 8 (08) : 119931 - 119939
  • [46] A Content Word Augmentation Method for Low-Resource Neural Machine Translation
    Li, Fuxue
    Zhao, Zhongchao
    Chi, Chuncheng
    Yan, Hong
    Zhang, Zhen
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 720 - 731
  • [47] Keeping Models Consistent between Pretraining and Translation for Low-Resource Neural Machine Translation
    Zhang, Wenbo
    Li, Xiao
    Yang, Yating
    Dong, Rui
    Luo, Gongxu
    FUTURE INTERNET, 2020, 12 (12): : 1 - 13
  • [48] GATITOS: Using a New Multilingual Lexicon for Low-resource Machine Translation
    Jones, Alex
    Caswell, Isaac
    Saxena, Ishank
    Firat, Orhan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 371 - 405
  • [49] Understanding and Improving Low-Resource Neural Machine Translation with Shallow Features
    Sun, Yanming
    Liu, Xuebo
    Wong, Derek F.
    Lin, Yuchu
    Li, Bei
    Zhan, Runzhe
    Chao, Lidia S.
    Zhang, Min
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 227 - 239
  • [50] Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages
    Duh, Kevin
    McNamee, Paul
    Post, Matt
    Thompson, Brian
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2667 - 2675