A Semi-supervised Corpus Annotation for Saudi Sentiment Analysis Using Twitter

被引:4
|
作者
Alqarafi, Abdulrahman [1 ,2 ]
Adeel, Ahsan [1 ]
Hawalah, Ahmed [2 ]
Swingler, Kevin [1 ]
Hussain, Amir [1 ]
机构
[1] Univ Stirling, Dept Comp Sci & Math, CogBID Lab, Stirling FK9 4LA, Scotland
[2] Univ Taibah, Medina, Saudi Arabia
关键词
Sentiment analysis; Saudi dialect; Word embedding;
D O I
10.1007/978-3-030-00563-4_57
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the literature, limited work has been conducted to develop sentiment resources for Saudi dialect. The lack of resources such as dialectical lexicons and corpora are some of the major bottlenecks to the successful development of Arabic sentiment analysis models. In this paper, a semi-supervised approach is presented to construct an annotated sentiment corpus for Saudi dialect using Twitter. The presented approach is primarily based on a list of lexicons built by using word embedding techniques such as word2vec. A huge corpus extracted from twitter is annotated and manually reviewed to exclude incorrect annotated tweets which is publicly available. For corpus validation, state-of-the-art classification algorithms (such as Logistic Regression, Support Vector Machine, and Naive Bayes) are applied and evaluated. Simulation results demonstrate that the Naive Bayes algorithm outperformed all other approaches and achieved accuracy up to 91%.
引用
收藏
页码:589 / 596
页数:8
相关论文
共 50 条
  • [1] AraSenCorpus: A Semi-Supervised Approach for Sentiment Annotation of a Large Arabic Text Corpus
    Al-Laith, Ali
    Shahbaz, Muhammad
    Alaskar, Hind F.
    Rehmat, Asim
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (05):
  • [2] Semi-supervised Sentiment Annotation of Large Corpora
    Brum, Henrico Bertini
    Volpe Nunes, Maria das Gracas
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018, 2018, 11122 : 385 - 395
  • [3] Sentiment Analysis on Twitter data with Semi-Supervised Doc2Vec
    Bilgin, Metin
    Senturk, Izzet Fatih
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 661 - 666
  • [4] A Saudi Dialect Twitter Corpus for Sentiment and Emotion Analysis
    Al-Thubaity, Abdulmohsen
    Alharbi, Mohammed
    Alqahtani, Saif
    Aljandal, Abdulrahman
    [J]. 2018 21ST SAUDI COMPUTER SOCIETY NATIONAL COMPUTER CONFERENCE (NCC), 2018,
  • [5] A hybrid semi-supervised boosting to sentiment analysis
    Tanha, Jafar
    Mahmudyan, Solmaz
    Farahi, Ahmad
    [J]. INTERNATIONAL JOURNAL OF NONLINEAR ANALYSIS AND APPLICATIONS, 2021, 12 (02): : 1769 - 1784
  • [6] Sentiment analysis in Turkish: Supervised, semi-supervised, and unsupervised techniques
    Aydin, Cem Rifki
    Gungor, Tunga
    [J]. NATURAL LANGUAGE ENGINEERING, 2021, 27 (04) : 455 - 483
  • [7] Improving Twitter Sentiment Analysis with Topic-Based Mixture Modeling and Semi-Supervised Training
    Xiang, Bing
    Zhou, Liang
    [J]. PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2014, : 434 - 439
  • [8] A semi-supervised approach to sentiment analysis using revised sentiment strength based on SentiWordNet
    Khan, Farhan Hassan
    Qamar, Usman
    Bashir, Saba
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 51 (03) : 851 - 872
  • [9] A semi-supervised approach to sentiment analysis using revised sentiment strength based on SentiWordNet
    Farhan Hassan Khan
    Usman Qamar
    Saba Bashir
    [J]. Knowledge and Information Systems, 2017, 51 : 851 - 872
  • [10] Sentiment analysis using semi-supervised learning with few labeled data
    Pan, Yuhao
    Chen, Zhiqun
    Suzuki, Yoshimi
    Fukumoto, Fumiyo
    Nishizaki, Hiromitsu
    [J]. 2020 INTERNATIONAL CONFERENCE ON CYBERWORLDS (CW 2020), 2020, : 231 - 234