Tharwa: A Large Scale Dialectal Arabic - Standard Arabic - English Lexicon

被引:0
|
作者
Diab, Mona [1 ]
Al-Badrashiny, Mohamed [1 ]
Aminian, Maryam [1 ]
Attia, Mohammed [1 ]
Dasigi, Pradeep [1 ]
Elfardy, Heba [2 ]
Eskander, Ramy [2 ]
Habash, Nizar [2 ]
Hawwari, Abdelati [1 ]
Salloum, Wael [2 ]
机构
[1] George Washington Univ, Dept Comp Sci, Washington, DC 20052 USA
[2] Columbia Univ, Ctr Computat Learning Syst, New York, NY USA
关键词
Egyptian Arabic Dictionary; Arabic Dialects; Arabic Morphology; Arabic Lexicon;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
We introduce an electronic three-way lexicon, Tharwa, comprising Dialectal Arabic, Modern Standard Arabic and English correspondents. The paper focuses on Egyptian Arabic as the first pilot dialect for the resource, with plans to expand to other dialects of Arabic in later phases of the project. We describe Tharwa's creation process and report on its current status. The lexical entries are augmented with various elements of linguistic information such as POS, gender, rationality, number, and root and pattern information. The lexicon is based on a compilation of information from both monolingual and bilingual existing resources such as paper dictionaries and electronic, corpus-based dictionaries. Multiple levels of quality checks are performed on the output of each step in the creation process. The importance of this lexicon lies in the fact that it is the first resource of its kind bridging multiple variants of Arabic with English. Furthermore, it is a wide coverage lexical resource containing over 73,000 Egyptian entries. Tharwa is publicly available. We believe it will have a significant impact on both Theoretical Linguistics as well as Computational Linguistics research.
引用
收藏
页码:3782 / 3789
页数:8
相关论文
共 50 条
  • [1] Morphological structure in the Arabic mental lexicon: Parallels between standard and dialectal Arabic
    Boudelaa, Sami
    Marslen-Wilson, William D.
    [J]. LANGUAGE AND COGNITIVE PROCESSES, 2013, 28 (10): : 1453 - 1473
  • [2] A Large-Scale Leveled Readability Lexicon for Standard Arabic
    Al Khalil, Muhamed
    Habash, Nizar
    Jiang, Zhengyang
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3053 - 3062
  • [3] The sociolinguistic functions of codeswitching between Standard Arabic and Dialectal Arabic
    Albirini, Abdulkafi
    [J]. LANGUAGE IN SOCIETY, 2011, 40 (05) : 537 - 562
  • [4] Simplified guidelines for the creation of Large Scale Dialectal Arabic Annotations
    Elfardy, Heba
    Diab, Mona
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 371 - 378
  • [5] Sentiment Analysis of Modern Standard Arabic and Egyptian Dialectal Arabic Tweets
    El-Naggar, Nadine
    El-Sonbaty, Yasser
    Abou El-Nasr, Mohamad
    [J]. 2017 COMPUTING CONFERENCE, 2017, : 880 - 887
  • [6] Automatic expandable large-scale sentiment lexicon of Modern Standard Arabic and Colloquial
    Ibrahim, Hossam S.
    Abdou, Sherif M.
    Gheith, Mervat
    [J]. 2015 FIRST INTERNATIONAL CONFERENCE ON ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2015): ADVANCES IN ARABIC COMPUTATIONAL LINGUISTICS, 2015, : 94 - 99
  • [7] Modern Standard Arabic Based Multilingual Approach for Dialectal Arabic Speech Recognition
    Elmahdy, Mohamed
    Gruhn, Rainer
    Minker, Wolfgang
    Abdennadher, Slim
    [J]. 2009 EIGHTH INTERNATIONAL SYMPOSIUM ON NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2009, : 169 - +
  • [8] DART: A Large Dataset of Dialectal Arabic Tweets
    Alsarsour, Israa
    Mohamed, Esraa
    Suwaileh, Reem
    Elsayed, Tamer
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3666 - 3670
  • [9] Standard and Dialectal Arabic Text Classification for Sentiment Analysis
    Maghfour, Mohcine
    Elouardighi, Abdeljalil
    [J]. MODEL AND DATA ENGINEERING, MEDI 2018, 2018, 11163 : 282 - 291
  • [10] Conventional Orthography for Dialectal Arabic
    Habash, Nizar
    Diab, Mona
    Rambow, Owen
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 711 - 718