Efficient Data Augmentation via lexical matching for boosting performance on Statistical Machine Translation for Indic and a Low-resource language

被引:0
|
作者
Saxena, Shefali [1 ]
Gupta, Ayush [1 ]
Daniel, Philemon [1 ]
机构
[1] Natl Inst Technol Hamirpur, Dept Elect & Commun Engn, Hamirpur, India
关键词
Data Augmentation; Low-resource language; Machine Translation; Evaluation;
D O I
10.1007/s11042-023-18086-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the fast advancement of AI technology in recent years, many excellent Data Augmentation (DA) approaches have been investigated to increase data efficiency in Natural Language Processing (NLP). The reliance on a large amount of data prohibits NLP models from performing tasks such as labelling enormous amounts of textual data, which require a substantial amount of time, money, and human resources; hence, a better model requires more data. Text DA technique rectifies the data by extending it, enhancing the model's accuracy and resilience. A novel lexical-based matching approach is the cornerstone of this work; it is used to improve the quality of the Machine Translation (MT) system. This study includes resource-rich Indic (i.e., Indo-Aryan and Dravidian language families) to examine the proposed techniques. Extensive experiments on a range of language pairs depict that the proposed method significantly improves scores in the enhanced dataset compared to the baseline system's BLEU, METEOR and ROUGE evaluation scores.
引用
收藏
页码:64255 / 64269
页数:15
相关论文
共 50 条
  • [1] Data Augmentation for Low-Resource Neural Machine Translation
    Fadaee, Marzieh
    Bisazza, Arianna
    Monz, Christof
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 567 - 573
  • [2] Extremely Low-resource Multilingual Neural Machine Translation for Indic Mizo Language
    Lalrempuii C.
    Soni B.
    International Journal of Information Technology, 2023, 15 (8) : 4275 - 4282
  • [3] STA: An efficient data augmentation method for low-resource neural machine translation
    Li, Fuxue
    Chi, Chuncheng
    Yan, Hong
    Liu, Beibei
    Shao, Mingzhi
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (01) : 121 - 132
  • [4] Generative-Adversarial Networks for Low-Resource Language Data Augmentation in Machine Translation
    Zeng, Linda
    2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 11 - 18
  • [5] Improved Unsupervised Statistical Machine Translation via Unsupervised Word Sense Disambiguation for a Low-Resource and Indic Languages
    Saxena, Shefali
    Chaurasia, Uttkarsh
    Bansal, Nitin
    Daniel, Philemon
    IETE JOURNAL OF RESEARCH, 2023, 69 (12) : 8848 - 8858
  • [6] A Diverse Data Augmentation Strategy for Low-Resource Neural Machine Translation
    Li, Yu
    Li, Xiao
    Yang, Yating
    Dong, Rui
    INFORMATION, 2020, 11 (05)
  • [7] Generalized Data Augmentation for Low-Resource Translation
    Xia, Mengzhou
    Kong, Xiang
    Anastasopoulos, Antonios
    Neubig, Graham
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5786 - 5796
  • [8] Machine Translation into Low-resource Language Varieties
    Kumar, Sachin
    Anastasopoulos, Antonios
    Wintner, Shuly
    Tsvetkov, Yulia
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 110 - 121
  • [9] A Bilingual Templates Data Augmentation Method for Low-Resource Neural Machine Translation
    Li, Fuxue
    Liu, Beibei
    Yan, Hong
    Shao, Mingzhi
    Xie, Peijun
    Li, Jiarui
    Chi, Chuncheng
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 40 - 51
  • [10] Boosting the Transformer with the BERT Supervision in Low-Resource Machine Translation
    Yan, Rong
    Li, Jiang
    Su, Xiangdong
    Wang, Xiaoming
    Gao, Guanglai
    APPLIED SCIENCES-BASEL, 2022, 12 (14):