Getting More Data for Low-resource Morphological Inflection: Language Models and Data Augmentation

被引:0
|
作者
Sorokin, Alexey [1 ]
机构
[1] Moscow MV Lomonosov State Univ, Moscow Inst Phys & Technol, Fac Math & Mech, Leninskie Gory,GSP 1, Moscow, Russia
关键词
inflection; encoder-decoder; abstract paradigms; language models; data augmentation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We investigate the effect of data augmentation on low-resource morphological segmentation. We compare two settings: the pure low-resource one, when only 100 annotated word forms are available, and the augmented one, where we use the original training set and 1000 unlabeled word forms to generate 1000 artificial inflected forms. Evaluating on Sigmorphon 2018 dataset, we observe that using the best among these two models reduces the error rate of state-of-the-art model by 6%, while for our baseline model the error reduction is 17%
引用
收藏
页码:3978 / 3983
页数:6
相关论文
共 50 条
  • [21] DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks
    Ding, Bosheng
    Liu, Linlin
    Bing, Lidong
    Kruengkrai, Canasai
    Nguyen, Thien Hai
    Joty, Shafiq
    Si, Luo
    Miao, Chunyan
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6045 - 6057
  • [22] Examining Sentiment Analysis for Low-Resource Languages with Data Augmentation Techniques
    Thakkar, Gaurish
    Preradovic, Nives Mikelic
    Tadic, Marko
    ENG, 2024, 5 (04): : 2920 - 2942
  • [23] Adversarial Word Dilution as Text Data Augmentation in Low-Resource Regime
    Chen, Junfan
    Zhang, Richong
    Luo, Zheyan
    Hu, Chunming
    Mao, Yongyi
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 12626 - 12634
  • [24] LOW-RESOURCE EXPRESSIVE TEXT-TO-SPEECH USING DATA AUGMENTATION
    Huybrechts, Goeric
    Merritt, Thomas
    Comini, Giulia
    Perz, Bartek
    Shah, Raahil
    Lorenzo-Trueba, Jaime
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6593 - 6597
  • [25] Low-Resource Comparative Opinion Quintuple Extraction by Data Augmentation with Prompting
    Xu, Qingting
    Hong, Yu
    Zhao, Fubang
    Song, Kaisong
    Kang, Yangyang
    Chen, Jiaxiang
    Zhou, Guodong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3892 - 3897
  • [26] Data Augmentation via Dependency Tree Morphing for Low-Resource Languages
    Sahin, Goezde Guel
    Steedman, Mark
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 5004 - 5009
  • [27] Understanding Compositional Data Augmentation in Typologically Diverse Morphological Inflection
    Samir, Farhan
    Silfverberg, Miikka
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 277 - 291
  • [28] Evaluation of the morphological rules for the Tenyidie language: a low-resource language
    Angami, Teisovi
    Kevichusa-Ezung, Mimi
    Singh, Sanasam Ranbir
    Tuithung, Themrichon
    LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [29] Entropy-guided Vocabulary Augmentation of Multilingual Language Models for Low-resource Tasks
    Nag, Arijit
    Samanta, Bidisha
    Mukherjee, Animesh
    Ganguly, Niloy
    Chakrabarti, Soumen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8619 - 8629
  • [30] Text data augmentation and pre-trained Language Model for enhancing text classification of low-resource languages
    Ziyaden, Atabay
    Yelenov, Amir
    Hajiyev, Fuad
    Rustamov, Samir
    Pak, Alexandr
    PEERJ COMPUTER SCIENCE, 2024, 10