A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages

被引:0
|
作者
Vania, Clara [1 ]
Kementchedjhieva, Yova [2 ]
Sogaard, Anders [2 ]
Lopez, Adam [1 ]
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland
[2] Univ Copenhagen, Copenhagen, Denmark
基金
英国工程与自然科学研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Parsers are available for only a handful of the world's languages, since they require lots of training data. How far can we get with just a small amount of training data? We systematically compare a set of simple strategies for improving low-resource parsers: data augmentation, which has not been tested before; cross-lingual training; and transliteration. Experimenting on three typologically diverse low-resource languages-North Sami, Galician, and Kazah-We find that (1) when only the low-resource treebank is available, data augmentation is very helpful; (2) when a related high-resource treebank is available, cross-lingual training is helpful and complements data augmentation; and (3) when the high-resource treebank uses a different writing system, transliteration into a shared orthographic spaces is also very helpful.
引用
收藏
页码:1105 / 1116
页数:12
相关论文
共 50 条
  • [1] Systematic Investigation of Strategies Tailored for Low-Resource Settings for Low-Resource Dependency Parsing
    Sandhan, Jivnesh
    Behera, Laxmidhar
    Goyal, Pawan
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2164 - 2171
  • [2] Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages
    Schlichtkrull, Michael Sejr
    Sogaard, Anders
    [J]. 15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 220 - 229
  • [3] Multilingual Dependency Parsing for Low-Resource African Languages: Case Studies on Bambara, Wolof, and Yoruba
    Dione, Cheikh Bamba
    [J]. IWPT 2021: THE 17TH INTERNATIONAL CONFERENCE ON PARSING TECHNOLOGIES: PROCEEDINGS OF THE CONFERENCE (INCLUDING THE IWPT 2021 SHARED TASK), 2021, : 84 - 92
  • [4] Voice Activation for Low-Resource Languages
    Kolesau, Aliaksei
    Sesok, Dmitrij
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (14):
  • [5] Multilingual Dependency Parsing for Low-Resource Languages: Case Studies on North Saami and Komi-Zyrian
    Lim, KyungTae
    Partanen, Niko
    Poibeau, Thierry
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2230 - 2235
  • [6] Data Augmentation via Dependency Tree Morphing for Low-Resource Languages
    Sahin, Goezde Guel
    Steedman, Mark
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 5004 - 5009
  • [7] Enabling Medical Translation for Low-Resource Languages
    Musleh, Ahmad
    Durrani, Nadir
    Temnikova, Irina
    Nakov, Preslav
    Vogel, Stephan
    Alsaad, Osama
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT II, 2018, 9624 : 3 - 16
  • [8] A Little Pretraining Goes a LongWay: A Case Study on Dependency Parsing Task for Low-resource Morphologically Rich Languages
    Sandhan, Jivnesh
    Krishna, Amrith
    Gupta, Ashim
    Behera, Laxmidhar
    Goyal, Pawan
    [J]. EACL 2021: THE 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 111 - 120
  • [9] Classifying educational materials in low-resource languages
    Sohsah, Gihad N.
    Guzey, Onur
    Tarmanini, Zaina
    [J]. 2016 15TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2016), 2016, : 431 - 435
  • [10] Extending Multilingual BERT to Low-Resource Languages
    Wang, Zihan
    Karthikeyan, K.
    Mayhew, Stephen
    Roth, Dan
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2649 - 2656