Efficient Data Augmentation via lexical matching for boosting performance on Statistical Machine Translation for Indic and a Low-resource language

被引:0
|
作者
Saxena, Shefali [1 ]
Gupta, Ayush [1 ]
Daniel, Philemon [1 ]
机构
[1] Natl Inst Technol Hamirpur, Dept Elect & Commun Engn, Hamirpur, India
关键词
Data Augmentation; Low-resource language; Machine Translation; Evaluation;
D O I
10.1007/s11042-023-18086-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the fast advancement of AI technology in recent years, many excellent Data Augmentation (DA) approaches have been investigated to increase data efficiency in Natural Language Processing (NLP). The reliance on a large amount of data prohibits NLP models from performing tasks such as labelling enormous amounts of textual data, which require a substantial amount of time, money, and human resources; hence, a better model requires more data. Text DA technique rectifies the data by extending it, enhancing the model's accuracy and resilience. A novel lexical-based matching approach is the cornerstone of this work; it is used to improve the quality of the Machine Translation (MT) system. This study includes resource-rich Indic (i.e., Indo-Aryan and Dravidian language families) to examine the proposed techniques. Extensive experiments on a range of language pairs depict that the proposed method significantly improves scores in the enhanced dataset compared to the baseline system's BLEU, METEOR and ROUGE evaluation scores.
引用
收藏
页码:64255 / 64269
页数:15
相关论文
共 50 条
  • [41] Neural Machine Translation Advised by Statistical Machine Translation: The Case of Farsi-Spanish Bilingually Low-Resource Scenario
    Ahmadnia, Benyamin
    Kordjamshidi, Parisa
    Haffari, Gholamreza
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 1209 - 1213
  • [42] Pre-Training on Mixed Data for Low-Resource Neural Machine Translation
    Zhang, Wenbo
    Li, Xiao
    Yang, Yating
    Dong, Rui
    INFORMATION, 2021, 12 (03)
  • [43] The neural machine translation models for the low-resource Kazakh-English language pair
    Karyukin, Vladislav
    Rakhimova, Diana
    Karibayeva, Aidana
    Turganbayeva, Aliya
    Turarbek, Asem
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [44] Overcoming the rare word problem for low-resource language pairs in neural machine translation
    Ngo, Thi-Vinh
    Ha, Thanh-Le
    Nguyen, Phuong-Thai
    Nguyen, Le-Minh
    arXiv, 2019,
  • [45] Simple measures of bridging lexical divergence help unsupervised neural machine translation for low-resource languages
    Khatri, Jyotsana
    Murthy, Rudra
    Banerjee, Tamali
    Bhattacharyya, Pushpak
    MACHINE TRANSLATION, 2021, 35 (04) : 711 - 744
  • [46] Low-Resource Language Discrimination toward Chinese Dialects with Transfer Learning and Data Augmentation
    Xu, Fan
    Dan, Yangjie
    Yan, Keyu
    Ma, Yong
    Wang, Mingwen
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (02)
  • [47] Improving Loanword Identification in Low-Resource Language with Data Augmentation and Multiple Feature Fusion
    Mi, Chenggang
    Zhu, Shaolin
    Nie, Rui
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [48] Unsupervised Neural Machine Translation for Low-Resource Domains via Meta-Learning
    Park, Cheonbok
    Tae, Yunwon
    Kim, Taehee
    Yang, Soyoung
    Khan, Mohammad Azam
    Park, Eunjeong
    Choo, Jaegul
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 2888 - 2901
  • [49] Does Masked Language Model Pre-training with Artificial Data Improve Low-resource Neural Machine Translation?
    Tamura, Hiroto
    Hirasawa, Tosho
    Kim, Hwichan
    Komachi, Mamoru
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2216 - 2225
  • [50] Improve example-based machine translation quality for low-resource language using ontology
    Khan Md Anwarus K.M.A.
    Yamada S.
    Tetsuro N.
    International Journal of Networked and Distributed Computing, 2017, 5 (3) : 176 - 191