Getting More Data for Low-resource Morphological Inflection: Language Models and Data Augmentation

被引:0
|
作者
Sorokin, Alexey [1 ]
机构
[1] Moscow MV Lomonosov State Univ, Moscow Inst Phys & Technol, Fac Math & Mech, Leninskie Gory,GSP 1, Moscow, Russia
关键词
inflection; encoder-decoder; abstract paradigms; language models; data augmentation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We investigate the effect of data augmentation on low-resource morphological segmentation. We compare two settings: the pure low-resource one, when only 100 annotated word forms are available, and the augmented one, where we use the original training set and 1000 unlabeled word forms to generate 1000 artificial inflected forms. Evaluating on Sigmorphon 2018 dataset, we observe that using the best among these two models reduces the error rate of state-of-the-art model by 6%, while for our baseline model the error reduction is 17%
引用
收藏
页码:3978 / 3983
页数:6
相关论文
共 50 条
  • [1] Pushing the Limits of Low-Resource Morphological Inflection
    Anastasopoulos, Antonios
    Neubig, Graham
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 984 - 996
  • [2] Generalized Data Augmentation for Low-Resource Translation
    Xia, Mengzhou
    Kong, Xiang
    Anastasopoulos, Antonios
    Neubig, Graham
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5786 - 5796
  • [3] Data Augmentation for Low-Resource Keyphrase Generation
    Garg, Krishna
    Chowdhury, Jishnu Ray
    Caragea, Cornelia
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8442 - 8455
  • [4] MELM: Data Augmentation with Masked Entity Language Modeling for Low-Resource NER
    Zhou, Ran
    Li, Xin
    He, Ruidan
    Bing, Lidong
    Cambria, Erik
    Si, Luo
    Miao, Chunyan
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2251 - 2262
  • [5] Multimodal Seed Data Augmentation for Low-Resource Audio Latin Cuengh Language
    Jiang, Lanlan
    Qin, Xingguo
    Zhang, Jingwei
    Li, Jun
    APPLIED SCIENCES-BASEL, 2024, 14 (20):
  • [6] Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation
    Bartelds, Martijn
    San, Nay
    McDonnell, Bradley
    Jurafsky, Dan
    Wieling, Martijn
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 715 - 729
  • [7] Data Augmentation for Low-Resource Quechua ASR Improvement
    Zevallos, Rodolfo
    Bel, Nuria
    Cambara, Guillermo
    Farrus, Mireia
    Luque, Jordi
    INTERSPEECH 2022, 2022, : 3518 - 3522
  • [8] SYNTHETIC DATA AUGMENTATION FOR IMPROVING LOW-RESOURCE ASR
    Thai, Bao
    Jimerson, Robert
    Arcoraci, Dominic
    Prud'hommeaux, Emily
    Ptucha, Raymond
    2019 IEEE WESTERN NEW YORK IMAGE AND SIGNAL PROCESSING WORKSHOP (WNYISPW), 2019,
  • [9] Data Augmentation for Low-Resource Neural Machine Translation
    Fadaee, Marzieh
    Bisazza, Arianna
    Monz, Christof
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 567 - 573
  • [10] Data Augmentation Methods for Low-Resource Orthographic Syllabification
    Suyanto, Suyanto
    Lhaksmana, Kemas M.
    Bijaksana, Moch Arif
    Kurniawan, Adriana
    IEEE ACCESS, 2020, 8 : 147399 - 147406