ZAEBUC: An Annotated Arabic-English Bilingual Writer Corpus

被引:0
|
作者
Habash, Nizar [1 ]
Palfreyman, David [2 ]
机构
[1] New York Univ Abu Dhabi, Abu Dhabi, U Arab Emirates
[2] Zayed Univ, Abu Dhabi, U Arab Emirates
关键词
Annotated Corpus; Learner Corpus; CEFR; Arabic; English;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present ZAEBUC, an annotated Arabic-English bilingual writer corpus comprising short essays by first-year university students at Zayed University in the United Arab Emirates. We describe and discuss the various guidelines and pipeline processes we followed to create the annotations and quality check them. The annotations include spelling and grammar correction, morphological tokenization, Part-of-Speech tagging, lemmatization, and Common European Framework of Reference (CEFR) ratings. All of the annotations are done on Arabic and English texts using consistent guidelines as much as possible, with tracked alignments among the different annotations, and to the original raw texts. For morphological tokenization, POS tagging, and lemmatization, we use existing automatic annotation tools followed by manual correction. We also present various measurements and correlations with preliminary insights drawn from the data and annotations. The publicly available ZAEBUC corpus and its annotations are intended to be the stepping stones for additional annotations.
引用
收藏
页码:79 / 88
页数:10
相关论文
共 50 条
  • [41] Arabic-English Dictionary of Qur'anic Usage
    Sanni, Amidu Olalekan
    WELT DES ISLAMS, 2012, 52 (02): : 207 - 208
  • [42] Code-Switching Language Modeling with Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English
    Hamed, Injy
    Zhu, Moritz
    Elmahdy, Mohamed
    Abdennadher, Slim
    Vu, Ngoc Thang
    SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 160 - 170
  • [43] Responses to the Statements of New General Self-Efficacy Scale: The Case of the Arabic-English Bilingual Speaker
    Mulhem, Huda
    El Alaoui, Khadija
    Hamdan, Amani K.
    Abdul-Rahim, Mohammad B.
    Pilotti, Maura A. E.
    Tallouzi, Ebtesam A.
    JOURNAL OF CROSS-CULTURAL PSYCHOLOGY, 2018, 49 (03) : 470 - 487
  • [44] A DICTIONARY OF EGYPTIAN ARABIC - ARABIC-ENGLISH - HINDS,M, BADAWI,E
    IRWIN, R
    TLS-THE TIMES LITERARY SUPPLEMENT, 1988, (4424): : 67 - 67
  • [45] Arabic-English Dictionary of Qur'anic Usage
    Khalil, Atif
    JOURNAL OF MEDIEVAL RELIGIOUS CULTURES, 2014, 40 (01) : 101 - 104
  • [46] BAAC: Bangor Arabic Annotated Corpus
    Alkhazi, Ibrahim S.
    Teahan, William J.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (11) : 131 - 140
  • [47] Arabic-English Dictionary of Qur'anic Usage
    Gencer, Bedri
    BILIMNAME, 2009, 16 (01): : 241 - 242
  • [48] The semiology of colors in scripture translation: Arabic-English
    Elewa, Abdelhamid
    SEMIOTICA, 2022, 2022 (246) : 117 - 138
  • [49] The Intricacies of Linguistic Interference in Arabic-English Translation
    Thawabteh, Mohammad Ahmad
    ARAB WORLD ENGLISH JOURNAL, 2013, : 189 - 199
  • [50] Arabic-English Dictionary of Qur'anic Usage
    Rippin, Andrew
    BULLETIN OF THE SCHOOL OF ORIENTAL AND AFRICAN STUDIES-UNIVERSITY OF LONDON, 2009, 72 : 162 - 164