Large Scale Arabic Error Annotation: Guidelines and Framework

被引:0
|
作者
Zaghouani, Wajdi [1 ]
Mohit, Behrang [1 ]
Habash, Nizar [2 ]
Obeid, Ossama [1 ]
Tomeh, Nadi [3 ]
Rozovskaya, Alla [2 ]
Farra, Noura [2 ]
Alkuhlani, Sarah [2 ]
Oflazer, Kemal [1 ]
机构
[1] Carnegie Mellon Univ Qatar, Doha, Qatar
[2] Columbia Univ, Ctr Computat Learning Syst, New York, NY 10027 USA
[3] Univ Paris 13, Sorbonne Paris Cite, F-93430 Villetaneuse, France
关键词
Error Annotation; Arabic; Guidelines;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
We present annotation guidelines and a web-based annotation framework developed as part of an effort to create a manually annotated Arabic corpus of errors and corrections for various text types. Such a corpus will be invaluable for developing Arabic error correction tools, both for training models and as a gold standard for evaluating error correction algorithms. We summarize the guidelines we created. We also describe issues encountered during the training of the annotators, as well as problems that are specific to the Arabic language that arose during the annotation process. Finally, we present the annotation tool that was developed as part of this project, the annotation pipeline, and the quality of the resulting annotations.
引用
收藏
页码:2362 / 2369
页数:8
相关论文
共 50 条
  • [1] Guidelines and Framework for a Large Scale Arabic Diacritized Corpus
    Zaghouani, Wajdi
    Bouamor, Houda
    Hawwari, Abdelati
    Diab, Mona
    Obeid, Ossama
    Ghoneim, Mahmoud
    Alqahtani, Sawsan
    Oflazer, Kemal
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3637 - 3643
  • [2] Large-Scale Training Framework for Video Annotation
    Hwang, Seong Jae
    Lee, Joonseok
    Varadarajan, Balakrishnan
    Gordon, Ariel
    Xu, Zheng
    Natsev, Apostol
    [J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 2394 - 2402
  • [3] A Framework for the Annotation of Arabic Legal Documents
    Mezghanni, Imen Bouaziz
    Gargouri, Faiez
    [J]. INNOVATION MANAGEMENT AND EDUCATION EXCELLENCE VISION 2020: FROM REGIONAL DEVELOPMENT SUSTAINABILITY TO GLOBAL ECONOMIC GROWTH, VOLS I - VI, 2016, : 1726 - 1739
  • [4] Simplified guidelines for the creation of Large Scale Dialectal Arabic Annotations
    Elfardy, Heba
    Diab, Mona
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 371 - 378
  • [5] Error Annotation of the Arabic Learner Corpus A New Error Tagset
    Alfaifi, Abdullah
    Atwell, Eric
    Abuhakema, Ghazi
    [J]. LANGUAGE PROCESSING AND KNOWLEDGE IN THE WEB, 2013, 8105 : 14 - 22
  • [6] Framework for Automatic Semantic Annotation of Arabic Websites
    Helmy, Tarek
    Al-Bukhitan, Saeed
    [J]. INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2016, 25 (01)
  • [7] Syntactic Annotation Guidelines for the Quranic Arabic Dependency Treebank
    Dukes, Kais
    Atwell, Eric
    Sharaf, Abdul-Baquee M.
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1822 - 1827
  • [8] A Framework of Large-scale and Real-time Image Annotation System
    Li, Ran
    Lu, Jianjiang
    Zhang, Yafei
    Lu, Zining
    Xu, Weiguang
    [J]. FIRST IITA INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, : 576 - 579
  • [9] Framework of Semantic Annotation of Arabic Document using Deep Learning
    Albukhitan, Saeed
    Alnazer, Ahmed
    Helmy, Tarek
    [J]. 11TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT) / THE 3RD INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40) / AFFILIATED WORKSHOPS, 2020, 170 : 989 - 994
  • [10] Enhancing the Arabic Treebank: A Collaborative Effort toward New Annotation Guidelines
    Maamouri, Mohamed
    Bies, Ann
    Kulick, Seth
    [J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3192 - 3196