SentiALG: Automated Corpus Annotation for Algerian Sentiment Analysis

被引:21
|
作者
Guellil, Imane [1 ,2 ]
Adeel, Ahsan [3 ]
Azouaou, Faical [2 ]
Hussain, Amir [3 ]
机构
[1] Ecole Super Sci Appl Alger ESSA, Algiers, Algeria
[2] Ecole Natl Super Informat, Lab Methodes Concept Syst LMCS, BP 68M, Algiers 16309, Algeria
[3] Univ Stirling, Inst Comp Sci & Math, Sch Nat Sci, Stirling, Scotland
基金
英国工程与自然科学研究理事会;
关键词
Arabic sentiment analysis; Algerian dialect; Sentiment lexicon; Sentiment corpus; Sentiment classification;
D O I
10.1007/978-3-030-00563-4_54
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data annotation is an important but time-consuming and costly procedure. To sort a text into two classes, the very first thing we need is a good annotation guideline, establishing what is required to qualify for each class. In the literature, the difficulties associated with an appropriate data annotation has been underestimated. In this paper, we present a novel approach to automatically construct an annotated sentiment corpus for Algerian dialect (A Maghrebi Arabic dialect). The construction of this corpus is based on an Algerian sentiment lexicon that is also constructed automatically. The presented work deals with the two widely used scripts on Arabic social media: Arabic and Arabizi. The proposed approach automatically constructs a sentiment corpus containing 8000 messages (where 4000 are dedicated to Arabic and 4000 to Arabizi). The achieved F1-score is up to 72% and 78% for an Arabic and Arabizi test sets, respectively. Ongoing work is aimed at integrating transliteration process for Arabizi messages to further improve the obtained results.
引用
收藏
页码:557 / 567
页数:11
相关论文
共 50 条
  • [41] Modifying Corpus Annotation to Support the Analysis of Learner Language
    Dickinson, Markus
    Lee, Chong Min
    [J]. CALICO JOURNAL, 2009, 26 (03): : 545 - 561
  • [42] AraCust: a Saudi Telecom Tweets corpus for sentiment analysis
    Almuqren, Latifah
    Cristea, Alexandra
    [J]. PeerJ Computer Science, 2021, 7 : 1 - 30
  • [43] Medical Entity Corpus with PICO Elements and Sentiment Analysis
    Zlabinger, Markus
    Andersson, Linda
    Hanbury, Allan
    Andersson, Michael
    Quasnik, Vanessa
    Brassey, Jon
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 292 - 296
  • [44] A corpus for aspect-based sentiment analysis in Vietnamese
    Nguyen, Minh-Hao
    Nguyen, Tri Minh
    Thin, Dang Van
    Nguyen, Ngan Luu-Thuy
    [J]. PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 317 - 321
  • [45] AraCust: a Saudi Telecom Tweets corpus for sentiment analysis
    Almuqren, Latifah
    Cristea, Alexandra
    [J]. PEERJ COMPUTER SCIENCE, 2021,
  • [46] A Corpus for Sentiment Analysis and Emotion Recognition for a Learning Environment
    Oramas-Bustillos, Raul
    Lucia Barron-Estrada, Maria
    Zatarain-Cabada, Ramon
    Lucia Ramirez-Avila, Sandra
    [J]. 2018 IEEE 18TH INTERNATIONAL CONFERENCE ON ADVANCED LEARNING TECHNOLOGIES (ICALT 2018), 2018, : 431 - 435
  • [47] SenTube: A Corpus for Sentiment Analysis on YouTube Social Media
    Uryupina, Olga
    Plank, Barbara
    Severyn, Aliaksei
    Rotondi, Agata
    Moschitti, Alessandro
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 4244 - 4249
  • [48] An Annotated Corpus for Turkish Sentiment Analysis at Sentence Level
    Omurca, Sevinc Ihan
    Ekinci, Ekin
    Turkmen, Hazal
    [J]. 2017 INTERNATIONAL ARTIFICIAL INTELLIGENCE AND DATA PROCESSING SYMPOSIUM (IDAP), 2017,
  • [49] Latvian Tweet Corpus and Investigation of Sentiment Analysis for Latvian
    Pinnis, Marcis
    [J]. HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2018, 2018, 307 : 112 - 119
  • [50] A Saudi Dialect Twitter Corpus for Sentiment and Emotion Analysis
    Al-Thubaity, Abdulmohsen
    Alharbi, Mohammed
    Alqahtani, Saif
    Aljandal, Abdulrahman
    [J]. 2018 21ST SAUDI COMPUTER SOCIETY NATIONAL COMPUTER CONFERENCE (NCC), 2018,