AraSenTi-Tweet: A Corpus for Arabic Sentiment Analysis of Saudi Tweets

被引:81
|
作者
Al-Twairesh, Nora [1 ]
Al-Khalifa, Hend [1 ]
Al-Salman, AbdulMalik [1 ]
Al-Ohali, Yousef [1 ]
机构
[1] King Saud Univ, Coll Comp & Informat Sci, Riyadh, Saudi Arabia
关键词
Sentiment Analysis; Arabic NLP; Corpus Sentiment Annotation; Arabic tweets; Saudi Dialect; RESOURCES;
D O I
10.1016/j.procs.2017.10.094
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Arabic Sentiment Analysis is an active research area these days. However, the Arabic language still lacks sufficient language resources to enable the tasks of sentiment analysis. In this paper, we present the details of collecting and constructing a large dataset of Arabic tweets. The techniques used in cleaning and pre-processing the collected dataset are explained. A corpus of Arabic tweets annotated for sentiment analysis was extracted from this dataset. The corpus consists mainly of tweets written in Modern Standard Arabic and the Saudi dialect. The corpus was manually annotated for sentiment. The annotation process is explained in detail and the challenges during the annotation are highlighted. The corpus contains 17,573 tweets labelled with four labels for sentiment: positive, negative, neutral and mixed. Baseline experiments were conducted to provide benchmark results for future work. (c) 2017 The Authors. Published by Elsevier B.V.
引用
收藏
页码:63 / 72
页数:10
相关论文
共 50 条
  • [31] Identifying Mubasher Software Products through Sentiment Analysis of Arabic Tweets
    AL-Rubaiee, Hamed
    Qiu, Renxi
    Li, Dayou
    [J]. 2016 INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS AND COMPUTER SYSTEMS (CIICS), 2016,
  • [32] HILATSA: A hybrid Incremental learning approach for Arabic tweets sentiment analysis
    Elshakankery, Kariman
    Ahmed, Mona F.
    [J]. EGYPTIAN INFORMATICS JOURNAL, 2019, 20 (03) : 163 - 171
  • [33] Sentiment Analysis Using Stacked Gated Recurrent Unit for Arabic Tweets
    Al Wazrah, Asma
    Alhumoud, Sarah
    [J]. IEEE ACCESS, 2021, 9 : 137176 - 137187
  • [34] Using Tweets and Emojis to Build TEAD: an Arabic Dataset for Sentiment Analysis
    Abdellaoui, Houssem
    Zrigui, Mounir
    [J]. COMPUTACION Y SISTEMAS, 2018, 22 (03): : 777 - 786
  • [35] Tweet Sentiment Analyzer: Sentiment Score Estimation Method for Assessing the Value of Opinions in Tweets
    Raja, Arun Manicka M.
    Swamynathan, S.
    [J]. INTERNATIONAL CONFERENCE ON ADVANCES IN INFORMATION COMMUNICATION TECHNOLOGY & COMPUTING, 2016, 2016,
  • [36] Sentiment analysis for Arabic tweet about the COVID-19 Worldwide Epidemic
    Alshutayri, Areej
    Alghamdi, Amal
    Nassibi, Nouran
    Aljojo, Nahla
    Aldhahri, Eman
    Aboulola, Omar
    [J]. ROMANIAN JOURNAL OF INFORMATION TECHNOLOGY AND AUTOMATIC CONTROL-REVISTA ROMANA DE INFORMATICA SI AUTOMATICA, 2022, 32 (02): : 127 - 136
  • [37] Building a Sentiment Corpus of Tweets in Brazilian Portuguese
    Brum, Henrico Bertini
    Volpe Nunes, Maria das Gracas
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4167 - 4172
  • [38] A Saudi Dialect Twitter Corpus for Sentiment and Emotion Analysis
    Al-Thubaity, Abdulmohsen
    Alharbi, Mohammed
    Alqahtani, Saif
    Aljandal, Abdulrahman
    [J]. 2018 21ST SAUDI COMPUTER SOCIETY NATIONAL COMPUTER CONFERENCE (NCC), 2018,
  • [39] Public perception of the Chinese president's visit to Saudi Arabia and the China-Arab Summit: sentiment analysis of Arabic tweets
    Hassan, Ahmed A. M.
    [J]. SOCIAL NETWORK ANALYSIS AND MINING, 2024, 14 (01)
  • [40] MAC: An Open and Free Moroccan Arabic Corpus for Sentiment Analysis
    Garouani, Moncef
    Kharroubi, Jamal
    [J]. 6TH INTERNATIONAL CONFERENCE ON SMART CITY APPLICATIONS, 2022, 393 : 849 - 858