Creating language resources for under-resourced languages: methodologies, and experiments with Arabic

被引:0
|
作者
Mahmoud El-Haj
Udo Kruschwitz
Chris Fox
机构
[1] Lancaster University,School of Computing and Communications
[2] University of Essex,CSEE
来源
关键词
Resources; Summarisation; Arabic; Under-resourced languages;
D O I
暂无
中图分类号
学科分类号
摘要
Language resources are important for those working on computational methods to analyse and study languages. These resources are needed to help advancing the research in fields such as natural language processing, machine learning, information retrieval and text analysis in general. We describe the creation of useful resources for languages that currently lack them, taking resources for Arabic summarisation as a case study. We illustrate three different paradigms for creating language resources, namely: (1) using crowdsourcing to produce a small resource rapidly and relatively cheaply; (2) translating an existing gold-standard dataset, which is relatively easy but potentially of lower quality; and (3) using manual effort with appropriately skilled human participants to create a resource that is more expensive but of high quality. The last of these was used as a test collection for TAC-2011. An evaluation of the resources is also presented.
引用
收藏
页码:549 / 580
页数:31
相关论文
共 50 条
  • [1] Creating language resources for under-resourced languages: methodologies, and experiments with Arabic
    El-Haj, Mahmoud
    Kruschwitz, Udo
    Fox, Chris
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2015, 49 (03) : 549 - 580
  • [2] Evaluation of Dictionary Creating Methods for Under-Resourced Languages
    Simon, Eszter
    Mittelholcz, Ivan
    [J]. TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 246 - 254
  • [3] Language Modeling for Speech Analytics in Under-Resourced Languages
    Wills, Simone
    Uys, Pieter
    van Heerden, Charl
    Barnard, Etienne
    [J]. INTERSPEECH 2020, 2020, : 4941 - 4945
  • [4] Language Identification for Under-Resourced Languages in the Basque Context
    Barroso, Nora
    de Ipina, Karmele Lopez
    Grana, Manuel
    Ezeiza, Aitzol
    [J]. SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS, 6TH INTERNATIONAL CONFERENCE SOCO 2011, 2011, 87 : 475 - 483
  • [5] Eigentrigraphemes for under-resourced languages
    Ko, Tom
    Mak, Brian
    [J]. SPEECH COMMUNICATION, 2014, 56 : 132 - 141
  • [6] The LREMap for Under-Resourced Languages
    Del Gratta, Riccardo
    Frontini, Francesca
    Khan, Anas Fahad
    Mariani, Joseph
    Soria, Claudia
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [7] Mismatched Crowdsourcing based Language Perception for Under-resourced Languages
    Chen, Wenda
    Hasegawa-Johnson, Mark
    Chen, Nancy F.
    [J]. SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 : 23 - 29
  • [8] Automatic processing of under-resourced languages
    Bernhard, Delphine
    Soria, Claudia
    [J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2018, 59 (03): : 7 - 14
  • [9] Transfer of Models and Resources for Under-Resourced Languages Semantic Role Labeling
    Mohamed, Yesuf
    Menzel, Wolfgang
    [J]. PAN-AFRICAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PT I, PANAFRICON AI 2023, 2024, 2068 : 141 - 153
  • [10] ASR and translation for under-resourced languages
    Besacier, L.
    Le, V-B.
    Boitet, C.
    Berment, V.
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 6079 - 6082