Exploiting multiple correlated modalities can enhance low-resource machine translation quality

被引:0
|
作者
Loitongbam Sanayai Meetei
Thoudam Doren Singh
Sivaji Bandyopadhyay
机构
[1] National Institute of Technology Silchar,Center for Natural Language Processing (CNLP)
[2] National Institute of Technology Silchar,Department of Computer Science and Engineering
[3] Jadavpur University,Department of Computer Science and Engineering
来源
关键词
Multimodal machine translation; Image-guided; Speech-guided; Low-resource; Manipuri; Hindi; Neural machine translation;
D O I
暂无
中图分类号
学科分类号
摘要
In an effort to enhance the machine translation (MT) quality of low-resource languages, we report the first study on multimodal machine translation (MMT) for Manipuri→\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document}English, Manipuri→\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document}Hindi and Manipuri→\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rightarrow $$\end{document}German language pairs. Manipuri is a morphologically rich and resource-constrained language with limited resources that can be computationally utilized. No such MMT dataset has not been reported for these language pairs till date. To build the parallel datasets, we collected news articles containing images and associated text in English from a local daily newspaper and used English as a pivot language. The machine-translated outputs of the existing translation systems of these languages go through manual post-editing to build the datasets. In addition to text, we build MT systems by exploiting features from images and audio recordings in the source language, i.e., Manipuri. We carried out an extensive analysis of the MT systems trained with text-only and multimodal inputs using automatic metrics and human evaluation techniques. Our findings attest that integrating multiple correlated modalities enhances the MT system performance in low-resource settings achieving a significant improvement of up to +3 BLEU score. The human assessment revealed that the fluency score of the MMT systems depends on the type of correlated auxiliary modality.
引用
收藏
页码:13137 / 13157
页数:20
相关论文
共 50 条
  • [21] Better Low-Resource Machine Translation with Smaller Vocabularies
    Signoroni, Edoardo
    Rychly, Pavel
    TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT I, 2024, 15048 : 184 - 195
  • [22] Recent advances of low-resource neural machine translation
    Haque, Rejwanul
    Liu, Chao-Hong
    Way, Andy
    MACHINE TRANSLATION, 2021, 35 (04) : 451 - 474
  • [23] A Strategy for Referential Problem in Low-Resource Neural Machine Translation
    Ji, Yatu
    Shi, Lei
    Su, Yila
    Ren, Qing-dao-er-ji
    Wu, Nier
    Wang, Hongbin
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 321 - 332
  • [24] Low-Resource Neural Machine Translation with Neural Episodic Control
    Wu, Nier
    Hou, Hongxu
    Sun, Shuo
    Zheng, Wei
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [25] Introduction to the second issue on machine translation for low-resource languages
    Liu, Chao-Hong
    Karakanta, Alina
    Tong, Audrey N.
    Aulov, Oleg
    Soboroff, Ian M.
    Washington, Jonathan
    Zhao, Xiaobing
    MACHINE TRANSLATION, 2021, 35 (01) : 1 - 2
  • [26] Machine Translation in Low-Resource Languages by an Adversarial Neural Network
    Sun, Mengtao
    Wang, Hao
    Pasquine, Mark
    Hameed, Ibrahim A.
    APPLIED SCIENCES-BASEL, 2021, 11 (22):
  • [27] Authenticated key establishment for low-resource devices exploiting correlated random channels
    Zenger, Christian T.
    Pietersz, Mario
    Zimmer, Jan
    Posielek, Jan-Felix
    Lenze, Thorben
    Paar, Christof
    COMPUTER NETWORKS, 2016, 109 : 105 - 123
  • [28] Unsupervised Source Hierarchies for Low-Resource Neural Machine Translation
    Currey, Anna
    Heafield, Kenneth
    RELEVANCE OF LINGUISTIC STRUCTURE IN NEURAL ARCHITECTURES FOR NLP, 2018, : 6 - 12
  • [29] Language Model Prior for Low-Resource Neural Machine Translation
    Baziotis, Christos
    Haddow, Barry
    Birch, Alexandra
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7622 - 7634
  • [30] Automatic Machine Translation of Poetry and a Low-Resource Language Pair
    Dunder, I
    Seljan, S.
    Pavlovski, M.
    2020 43RD INTERNATIONAL CONVENTION ON INFORMATION, COMMUNICATION AND ELECTRONIC TECHNOLOGY (MIPRO 2020), 2020, : 1034 - 1039