Improving Loanword Identification in Low-Resource Language with Data Augmentation and Multiple Feature Fusion

被引:1
|
作者
Mi, Chenggang [1 ]
Zhu, Shaolin [2 ]
Nie, Rui [3 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[2] Zhengzhou Univ Light Ind, Coll Software Engn, Zhengzhou, Peoples R China
[3] Chinese Flight Test Estab, Xian, Peoples R China
基金
中国国家自然科学基金;
关键词
Computational linguistics - Natural language processing systems;
D O I
10.1155/2021/9975078
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Loanword identification is studied in recent years to alleviate data sparseness in several natural language processing (NLP) tasks, such as machine translation, cross-lingual information retrieval, and so on. However, recent studies on this topic usually put efforts on high-resource languages (such as Chinese, English, and Russian); for low-resource languages, such as Uyghur and Mongolian, due to the limitation of resources and lack of annotated data, loanword identification on these languages tends to have lower performance. To overcome this problem, we first propose a lexical constraint-based data augmentation method to generate training data for low-resource language loanword identification; then, a loanword identification model based on a log-linear RNN is introduced to improve the performance of low-resource loanword identification by incorporating features such as word-level embeddings, character-level embeddings, pronunciation similarity, and part-of-speech (POS) into one model. Experimental results on loanword identification in Uyghur (in this study, we mainly focus on Arabic, Chinese, Russian, and Turkish loanwords in Uyghur) showed that our proposed method achieves best performance compared with several strong baseline systems.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Data augmentation for low-resource languages NMT guided by constrained sampling
    Maimaiti, Mieradilijiang
    Liu, Yang
    Luan, Huanbo
    Sun, Maosong
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (01) : 30 - 51
  • [32] Data Augmentation, Feature Combination, and Multilingual Neural Networks to Improve ASR and KWS Performance for Low-resource Languages
    Tueske, Zoltan
    Golik, Pavel
    Nolden, David
    Schlueter, Ralf
    Ney, Hermann
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1420 - 1424
  • [33] A Diverse Data Augmentation Strategy for Low-Resource Neural Machine Translation
    Li, Yu
    Li, Xiao
    Yang, Yating
    Dong, Rui
    INFORMATION, 2020, 11 (05)
  • [34] Optimizing the impact of data augmentation for low-resource grammatical error correction
    Solyman, Aiman
    Zappatore, Marco
    Zhenyu, Wang
    Mahmoud, Zeinab
    Alfatemi, Ali
    Ibrahim, Ashraf Osman
    Gabralla, Lubna Abdelkareim
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (06)
  • [35] DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks
    Ding, Bosheng
    Liu, Linlin
    Bing, Lidong
    Kruengkrai, Canasai
    Nguyen, Thien Hai
    Joty, Shafiq
    Si, Luo
    Miao, Chunyan
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6045 - 6057
  • [36] Adversarial Word Dilution as Text Data Augmentation in Low-Resource Regime
    Chen, Junfan
    Zhang, Richong
    Luo, Zheyan
    Hu, Chunming
    Mao, Yongyi
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 12626 - 12634
  • [37] Examining Sentiment Analysis for Low-Resource Languages with Data Augmentation Techniques
    Thakkar, Gaurish
    Preradovic, Nives Mikelic
    Tadic, Marko
    ENG, 2024, 5 (04): : 2920 - 2942
  • [38] LOW-RESOURCE EXPRESSIVE TEXT-TO-SPEECH USING DATA AUGMENTATION
    Huybrechts, Goeric
    Merritt, Thomas
    Comini, Giulia
    Perz, Bartek
    Shah, Raahil
    Lorenzo-Trueba, Jaime
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6593 - 6597
  • [39] Low-Resource Comparative Opinion Quintuple Extraction by Data Augmentation with Prompting
    Xu, Qingting
    Hong, Yu
    Zhao, Fubang
    Song, Kaisong
    Kang, Yangyang
    Chen, Jiaxiang
    Zhou, Guodong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3892 - 3897
  • [40] Data Augmentation via Dependency Tree Morphing for Low-Resource Languages
    Sahin, Goezde Guel
    Steedman, Mark
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 5004 - 5009