AraXLNet: pre-trained language model for sentiment analysis of Arabic

被引:11
|
作者
Alduailej, Alhanouf [1 ]
Alothaim, Abdulrahman [1 ]
机构
[1] King Saud Univ, Dept Informat Syst, Coll Comp & Informat Sci, Riyadh 11451, Saudi Arabia
关键词
Sentiment analysis; Language models; NLP; XLNet; AraXLNet; Text mining; NEURAL-NETWORK;
D O I
10.1186/s40537-022-00625-z
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Arabic language is a complex language with little resources; therefore, its limitations create a challenge to produce accurate text classification tasks such as sentiment analysis. The main goal of sentiment analysis is to determine the overall orientation of a given text in terms of whether it is positive, negative, or neutral. Recently, language models have shown great results in promoting the accuracy of text classification in English. The models are pre-trained on a large dataset and then fine-tuned on the downstream tasks. Particularly, XLNet has achieved state-of-the-art results for diverse natural language processing (NLP) tasks in English. In this paper, we hypothesize that such parallel success can be achieved in Arabic. The paper aims to support this hypothesis by producing the first XLNet-based language model in Arabic called AraXLNet, demonstrating its use in Arabic sentiment analysis in order to improve the prediction accuracy of such tasks. The results showed that the proposed model, AraXLNet, with Farasa segmenter achieved an accuracy results of 94.78%, 93.01%, and 85.77% in sentiment analysis task for Arabic using multiple benchmark datasets. This result outperformed AraBERT that obtained 84.65%, 92.13%, and 85.05% on the same datasets, respectively. The improved accuracy of the proposed model was evident using multiple benchmark datasets, thus offering promising advancement in the Arabic text classification tasks.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] AraXLNet: pre-trained language model for sentiment analysis of Arabic
    Alhanouf Alduailej
    Abdulrahman Alothaim
    [J]. Journal of Big Data, 9
  • [2] Leveraging Pre-trained Language Model for Speech Sentiment Analysis
    Shon, Suwon
    Brusco, Pablo
    Pan, Jing
    Han, Kyu J.
    Watanabe, Shinji
    [J]. INTERSPEECH 2021, 2021, : 3420 - 3424
  • [3] A Comparative Study of Pre-trained Word Embeddings for Arabic Sentiment Analysis
    Zouidine, Mohamed
    Khalil, Mohammed
    [J]. 2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 1243 - 1248
  • [4] Aspect Based Sentiment Analysis by Pre-trained Language Representations
    Liang Tianxin
    Yang Xiaoping
    Zhou Xibo
    Wang Bingqian
    [J]. 2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1262 - 1265
  • [5] TwitterBERT: Framework for Twitter Sentiment Analysis Based on Pre-trained Language Model Representations
    Azzouza, Noureddine
    Akli-Astouati, Karima
    Ibrahim, Roliana
    [J]. EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 : 428 - 437
  • [6] Comparing Pre-Trained Language Model for Arabic Hate Speech Detection
    Daouadi, Kheir Eddine
    Boualleg, Yaakoub
    Guehairia, Oussama
    [J]. COMPUTACION Y SISTEMAS, 2024, 28 (02): : 681 - 693
  • [7] Enhancing Turkish Sentiment Analysis Using Pre-Trained Language Models
    Koksal, Omer
    [J]. 29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [8] Incorporating Dynamic Semantics into Pre-Trained Language Model for Aspect-based Sentiment Analysis
    Zhang, Kai
    Zhang, Kun
    Zhang, Mengdi
    Zhao, Hongke
    Liu, Qi
    Wu, Wei
    Chen, Enhong
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3599 - 3610
  • [9] Sentiment Analysis Using Pre-Trained Language Model With No Fine-Tuning and Less Resource
    Kit, Yuheng
    Mokji, Musa Mohd
    [J]. IEEE ACCESS, 2022, 10 : 107056 - 107065
  • [10] An Entity-Level Sentiment Analysis of Financial Text Based on Pre-Trained Language Model
    Huang, Zhihong
    Fang, Zhijian
    [J]. 2020 IEEE 18TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), VOL 1, 2020, : 391 - 396