An empirical study of a novel multimodal dataset for low-resource machine translation

被引:0
|
作者
Meetei, Loitongbam Sanayai [1 ,2 ]
Singh, Thoudam Doren [1 ]
Bandyopadhyay, Sivaji [3 ]
机构
[1] Natl Inst Technol Silchar, Dept Comp Sci & Engn, Silchar, Assam, India
[2] Siksha O Anusandhan Deemed Be Univ, ITER, Bhubaneswar, Odisha, India
[3] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, West Bengal, India
关键词
Multimodal machine translation; Low resource; Manipuri; Bengali; Visual and Speech guided;
D O I
10.1007/s10115-024-02087-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cues from multiple modalities have been successfully applied in several fields of natural language processing including machine translation (MT). However, the application of multimodal cues in low-resource MT (LRMT) is still an open research problem. The main challenge of LRMT is the lack of abundant parallel data which makes it difficult to build MT systems for a reasonable output. Using multimodal cues can provide additional context and information that can help to mitigate this challenge. To address this challenge, we present a multimodal machine translation (MMT) dataset of low-resource languages. The dataset consists of images, audio and corresponding parallel text for a low-resource language pair that is Manipuri-English. The text dataset is collected from the news articles of local daily newspapers and subsequently translated into the target language by translators of the native speakers. The audio version by native speakers for the Manipuri text is recorded for the experiments. The study also investigates whether the correlated audio-visual cues enhance the performance of the machine translation system. Several experiments are conducted for a systematic evaluation of the effectiveness utilizing multiple modalities. With the help of automatic metrics and human evaluation, a detailed analysis of the MT systems trained with text-only and multimodal inputs is carried out. Experimental results attest that the MT systems in low-resource settings could be significantly improved up to +2.7 BLEU score by incorporating correlated modalities. The human evaluation reveals that the type of correlated auxiliary modality affects the adequacy and fluency performance in the MMT systems. Our results emphasize the potential of using cues from auxiliary modalities to enhance machine translation systems, particularly in situations with limited resources.
引用
收藏
页码:7031 / 7055
页数:25
相关论文
共 50 条
  • [1] The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation
    Ahia, Orevaoghene
    Kreutzer, Julia
    Hooker, Sara
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3316 - 3333
  • [2] An empirical study of low-resource neural machine translation of manipuri in multilingual settings
    Salam Michael Singh
    Thoudam Doren Singh
    [J]. Neural Computing and Applications, 2022, 34 : 14823 - 14844
  • [3] An empirical study of low-resource neural machine translation of manipuri in multilingual settings
    Singh, Salam Michael
    Singh, Thoudam Doren
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (17): : 14823 - 14844
  • [4] An empirical study of low-resource neural machine translation of manipuri in multilingual settings
    Singh, Salam Michael
    Singh, Thoudam Doren
    [J]. Neural Computing and Applications, 2022, 34 (17) : 14823 - 14844
  • [5] Unsupervised Multimodal Machine Translation for Low-resource Distant Language Pairs
    Tayir, Turghun
    Li, Lin
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)
  • [6] Survey of Low-Resource Machine Translation
    Haddow, Barry
    Bawden, Rachel
    Barone, Antonio Valerio Miceli
    Helcl, Jindrich
    Birch, Alexandra
    [J]. COMPUTATIONAL LINGUISTICS, 2022, 48 (03) : 673 - 732
  • [7] Adding Visual Information to Improve Multimodal Machine Translation for Low-Resource Language
    Shi, Xiayang
    Yu, Zhenqiang
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [8] Translation Memories as Baselines for Low-Resource Machine Translation
    Knowles, Rebecca
    Littell, Patrick
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6759 - 6767
  • [9] Revisiting Low-Resource Neural Machine Translation: A Case Study
    Sennrich, Rico
    Zhang, Biao
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 211 - 221
  • [10] Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data
    Chowdhury, Koel Dutta
    Hasanuzzaman, Mohammed
    Liu, Qun
    [J]. DEEP LEARNING APPROACHES FOR LOW-RESOURCE NATURAL LANGUAGE PROCESSING (DEEPLO), 2018, : 33 - 42