Getting More Data for Low-resource Morphological Inflection: Language Models and Data Augmentation

被引:0
|
作者
Sorokin, Alexey [1 ]
机构
[1] Moscow MV Lomonosov State Univ, Moscow Inst Phys & Technol, Fac Math & Mech, Leninskie Gory,GSP 1, Moscow, Russia
关键词
inflection; encoder-decoder; abstract paradigms; language models; data augmentation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We investigate the effect of data augmentation on low-resource morphological segmentation. We compare two settings: the pure low-resource one, when only 100 annotated word forms are available, and the augmented one, where we use the original training set and 1000 unlabeled word forms to generate 1000 artificial inflected forms. Evaluating on Sigmorphon 2018 dataset, we observe that using the best among these two models reduces the error rate of state-of-the-art model by 6%, while for our baseline model the error reduction is 17%
引用
收藏
页码:3978 / 3983
页数:6
相关论文
共 50 条
  • [11] Low-Resource Language Discrimination toward Chinese Dialects with Transfer Learning and Data Augmentation
    Xu, Fan
    Dan, Yangjie
    Yan, Keyu
    Ma, Yong
    Wang, Mingwen
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (02)
  • [12] Improving Loanword Identification in Low-Resource Language with Data Augmentation and Multiple Feature Fusion
    Mi, Chenggang
    Zhu, Shaolin
    Nie, Rui
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [13] Generative-Adversarial Networks for Low-Resource Language Data Augmentation in Machine Translation
    Zeng, Linda
    2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 11 - 18
  • [14] MIXSPEECH: DATA AUGMENTATION FOR LOW-RESOURCE AUTOMATIC SPEECH RECOGNITION
    Meng, Linghui
    Xu, Jin
    Tan, Xu
    Wang, Jindong
    Qin, Tao
    Xu, Bo
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7008 - 7012
  • [15] Data augmentation for low-resource grapheme-to-phoneme mapping
    Hammond, Michael
    SIGMORPHON 2021: 18TH SIGMORPHON WORKSHOP ON COMPUTATIONAL RESEARCH IN PHONETICS, PHONOLOGY, AND MORPHOLOGY, 2021, : 126 - 130
  • [16] Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution
    Nguyen, Toan Q.
    Murray, Kenton
    Chiang, David
    IWSLT 2021: THE 18TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION, 2021, : 287 - 293
  • [17] DALE: Generative Data Augmentation for Low-Resource Legal NLP
    Ghosh, Sreyan
    Evuru, Chandra Kiran
    Kumar, Sonal
    Ramaneswaran, S.
    Sakshi, S.
    Tyagi, Utkarsh
    Manocha, Dinesh
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 8511 - 8565
  • [18] Data augmentation for low-resource languages NMT guided by constrained sampling
    Maimaiti, Mieradilijiang
    Liu, Yang
    Luan, Huanbo
    Sun, Maosong
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (01) : 30 - 51
  • [19] A Diverse Data Augmentation Strategy for Low-Resource Neural Machine Translation
    Li, Yu
    Li, Xiao
    Yang, Yating
    Dong, Rui
    INFORMATION, 2020, 11 (05)
  • [20] Optimizing the impact of data augmentation for low-resource grammatical error correction
    Solyman, Aiman
    Zappatore, Marco
    Zhenyu, Wang
    Mahmoud, Zeinab
    Alfatemi, Ali
    Ibrahim, Ashraf Osman
    Gabralla, Lubna Abdelkareim
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (06)