Large Scale Arabic Error Annotation: Guidelines and Framework

被引：0

作者：

Zaghouani, Wajdi ^{[1
]}

Mohit, Behrang ^{[1
]}

Habash, Nizar ^{[2
]}

Obeid, Ossama ^{[1
]}

Tomeh, Nadi ^{[3
]}

Rozovskaya, Alla ^{[2
]}

Farra, Noura ^{[2
]}

Alkuhlani, Sarah ^{[2
]}

Oflazer, Kemal ^{[1
]}

机构：

[1] Carnegie Mellon Univ Qatar, Doha, Qatar

[2] Columbia Univ, Ctr Computat Learning Syst, New York, NY 10027 USA

[3] Univ Paris 13, Sorbonne Paris Cite, F-93430 Villetaneuse, France

来源：

LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2014年

关键词：

Error Annotation; Arabic; Guidelines;

D O I：

暂无

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

We present annotation guidelines and a web-based annotation framework developed as part of an effort to create a manually annotated Arabic corpus of errors and corrections for various text types. Such a corpus will be invaluable for developing Arabic error correction tools, both for training models and as a gold standard for evaluating error correction algorithms. We summarize the guidelines we created. We also describe issues encountered during the training of the annotators, as well as problems that are specific to the Arabic language that arose during the annotation process. Finally, we present the annotation tool that was developed as part of this project, the annotation pipeline, and the quality of the resulting annotations.

引用

页码：2362 / 2369

页数：8

共 50 条

[1] Guidelines and Framework for a Large Scale Arabic Diacritized Corpus
Zaghouani, Wajdi
Bouamor, Houda
Hawwari, Abdelati
Diab, Mona
Obeid, Ossama
Ghoneim, Mahmoud
Alqahtani, Sawsan
Oflazer, Kemal
[J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3637 - 3643
[2] Large-Scale Training Framework for Video Annotation
Hwang, Seong Jae
Lee, Joonseok
Varadarajan, Balakrishnan
Gordon, Ariel
Xu, Zheng
Natsev, Apostol
[J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 2394 - 2402
[3] A Framework for the Annotation of Arabic Legal Documents
Mezghanni, Imen Bouaziz
Gargouri, Faiez
[J]. INNOVATION MANAGEMENT AND EDUCATION EXCELLENCE VISION 2020: FROM REGIONAL DEVELOPMENT SUSTAINABILITY TO GLOBAL ECONOMIC GROWTH, VOLS I - VI, 2016, : 1726 - 1739
[4] Simplified guidelines for the creation of Large Scale Dialectal Arabic Annotations
Elfardy, Heba
Diab, Mona
[J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 371 - 378
[5] Error Annotation of the Arabic Learner Corpus A New Error Tagset
Alfaifi, Abdullah
Atwell, Eric
Abuhakema, Ghazi
[J]. LANGUAGE PROCESSING AND KNOWLEDGE IN THE WEB, 2013, 8105 : 14 - 22
[6] Framework for Automatic Semantic Annotation of Arabic Websites
Helmy, Tarek
Al-Bukhitan, Saeed
[J]. INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2016, 25 (01)
[7] Syntactic Annotation Guidelines for the Quranic Arabic Dependency Treebank
Dukes, Kais
Atwell, Eric
Sharaf, Abdul-Baquee M.
[J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1822 - 1827
[8] A Framework of Large-scale and Real-time Image Annotation System
Li, Ran
Lu, Jianjiang
Zhang, Yafei
Lu, Zining
Xu, Weiguang
[J]. FIRST IITA INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, : 576 - 579
[9] Framework of Semantic Annotation of Arabic Document using Deep Learning
Albukhitan, Saeed
Alnazer, Ahmed
Helmy, Tarek
[J]. 11TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT) / THE 3RD INTERNATIONAL CONFERENCE ON EMERGING DATA AND INDUSTRY 4.0 (EDI40) / AFFILIATED WORKSHOPS, 2020, 170 : 989 - 994
[10] Enhancing the Arabic Treebank: A Collaborative Effort toward New Annotation Guidelines
Maamouri, Mohamed
Bies, Ann
Kulick, Seth
[J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3192 - 3196

← 1 2 3 4 5 →