ZAEBUC: An Annotated Arabic-English Bilingual Writer Corpus

被引:0
|
作者
Habash, Nizar [1 ]
Palfreyman, David [2 ]
机构
[1] New York Univ Abu Dhabi, Abu Dhabi, U Arab Emirates
[2] Zayed Univ, Abu Dhabi, U Arab Emirates
关键词
Annotated Corpus; Learner Corpus; CEFR; Arabic; English;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present ZAEBUC, an annotated Arabic-English bilingual writer corpus comprising short essays by first-year university students at Zayed University in the United Arab Emirates. We describe and discuss the various guidelines and pipeline processes we followed to create the annotations and quality check them. The annotations include spelling and grammar correction, morphological tokenization, Part-of-Speech tagging, lemmatization, and Common European Framework of Reference (CEFR) ratings. All of the annotations are done on Arabic and English texts using consistent guidelines as much as possible, with tracked alignments among the different annotations, and to the original raw texts. For morphological tokenization, POS tagging, and lemmatization, we use existing automatic annotation tools followed by manual correction. We also present various measurements and correlations with preliminary insights drawn from the data and annotations. The publicly available ZAEBUC corpus and its annotations are intended to be the stepping stones for additional annotations.
引用
下载
收藏
页码:79 / 88
页数:10
相关论文
共 50 条
  • [1] Cairo Student Code-Switch (CSCS) Corpus: An Annotated Egyptian Arabic-English Corpus
    Balabel, Mohamed
    Hamed, Injy
    Abdennadher, Slim
    Ngoc Thang Vu
    Cetinoglu, Oezlem
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3973 - 3977
  • [3] Grammatical Number in Arabic-English Bilingual Children
    Mashaqba, Bassil
    Huneety, Anas
    Alshdaifat, Abdallah
    Abu Aisheh, Wafa'a
    EURASIAN JOURNAL OF APPLIED LINGUISTICS, 2023, 9 (02): : 170 - 185
  • [4] Assessing the Arabic-English Bilingual Reading Competences
    Midraj, Jessica
    Midraj, Sadiq
    ARAB WORLD ENGLISH JOURNAL, 2013, 4 (02) : 185 - 199
  • [5] Disfluency characteristics of Omani Arabic-English bilingual speakers
    Al'Amri, Fathiya
    Robb, Michael P.
    CLINICAL LINGUISTICS & PHONETICS, 2021, 35 (07) : 593 - 609
  • [6] Language intervention in Arabic-English bilingual aphasia: A case study
    Knoph, Monica I. Koumanidi
    APHASIOLOGY, 2013, 27 (12) : 1440 - 1458
  • [7] Executive Function Differences Between Bilingual Arabic-English and Monolingual Arabic Children
    Abdelgafar, Ghada Mohammed
    Moawad, Ruba AbdelMatloub
    JOURNAL OF PSYCHOLINGUISTIC RESEARCH, 2015, 44 (05) : 651 - 667
  • [8] Developing Bilingual Arabic-English Ontologies of Al-Quran
    Alqahtani, Mohammad M.
    Atwell, Eric
    2018 IEEE 2ND INTERNATIONAL WORKSHOP ON ARABIC AND DERIVED SCRIPT ANALYSIS AND RECOGNITION (ASAR), 2018, : 96 - 101
  • [9] Cross-Language Generalization in an Arabic-English Bilingual Person with Aphasia
    Knoph, Monica I. Koumanidi
    AOA2010, 48TH ACADEMY OF APHASIA PROCEEDINGS, 2010, 6 : 208 - 209
  • [10] An investigation into the linguistic landscape of translingual storybooks for Arabic-English bilingual children
    Gallagher, Kay
    Bataineh, Afaf
    JOURNAL OF MULTILINGUAL AND MULTICULTURAL DEVELOPMENT, 2020, 41 (04) : 348 - 367