Getting More Data for Low-resource Morphological Inflection: Language Models and Data Augmentation

被引:0
|
作者
Sorokin, Alexey [1 ]
机构
[1] Moscow MV Lomonosov State Univ, Moscow Inst Phys & Technol, Fac Math & Mech, Leninskie Gory,GSP 1, Moscow, Russia
关键词
inflection; encoder-decoder; abstract paradigms; language models; data augmentation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We investigate the effect of data augmentation on low-resource morphological segmentation. We compare two settings: the pure low-resource one, when only 100 annotated word forms are available, and the augmented one, where we use the original training set and 1000 unlabeled word forms to generate 1000 artificial inflected forms. Evaluating on Sigmorphon 2018 dataset, we observe that using the best among these two models reduces the error rate of state-of-the-art model by 6%, while for our baseline model the error reduction is 17%
引用
收藏
页码:3978 / 3983
页数:6
相关论文
共 50 条
  • [31] BioAug: Conditional Generation based Data Augmentation for Low-Resource Biomedical NER
    Ghosh, Sreyan
    Tyagi, Utkarsh
    Kumar, Sonal
    Manocha, Dinesh
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 1853 - 1858
  • [32] PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks
    Wang, Yufei
    Xu, Can
    Sun, Qingfeng
    Hu, Huang
    Tao, Chongyang
    Geng, Xiubo
    Jiang, Daxin
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4242 - 4255
  • [33] Improving Low-resource Named Entity Recognition with Graph Propagated Data Augmentation
    Cai, Jiong
    Huang, Shen
    Jiang, Yong
    Tan, Zeqi
    Xie, Pengjun
    Tu, Kewei
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 110 - 118
  • [34] Combining Simple but Novel Data Augmentation Methods for Improving Low-Resource ASR
    Damania, Ronit
    Homan, Christopher
    Prud'hommeaux, Emily
    INTERSPEECH 2022, 2022, : 4890 - 4894
  • [35] A Bilingual Templates Data Augmentation Method for Low-Resource Neural Machine Translation
    Li, Fuxue
    Liu, Beibei
    Yan, Hong
    Shao, Mingzhi
    Xie, Peijun
    Li, Jiarui
    Chi, Chuncheng
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 40 - 51
  • [36] Exogenous and Endogenous Data Augmentation for Low-Resource Complex Named Entity Recognition
    Zhang, Xinghua
    Chen, Gaode
    Cui, Shiyao
    Sheng, Jiawei
    Liu, Tingwen
    Xu, Hongbo
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 630 - 640
  • [37] Image-Mediated Data Augmentation for Low-Resource Human Activity Recognition
    Wang, Zihao
    Qu, Youli
    Tao, Junru
    Song, Yudan
    PROCEEDINGS OF THE 2019 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTE AND DATA ANALYSIS (ICCDA 2019), 2019, : 49 - 54
  • [38] Combining Simple but Novel Data Augmentation Methods for Improving Low-Resource ASR
    Damania, Ronit
    Homan, Christopher
    Prud'hommeaux, Emily
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2022, 2022-September : 4890 - 4894
  • [39] STA: An efficient data augmentation method for low-resource neural machine translation
    Li, Fuxue
    Chi, Chuncheng
    Yan, Hong
    Liu, Beibei
    Shao, Mingzhi
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (01) : 121 - 132
  • [40] Domain-Aligned Data Augmentation for Low-Resource and Imbalanced Text Classification
    Stylianou, Nikolaos
    Chatzakou, Despoina
    Tsikrika, Theodora
    Vrochidis, Stefanos
    Kompatsiaris, Ioannis
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT II, 2023, 13981 : 172 - 187