AraXLNet: pre-trained language model for sentiment analysis of Arabic

被引：11

作者：

Alduailej, Alhanouf ^{[1
]}

Alothaim, Abdulrahman ^{[1
]}

机构：

[1] King Saud Univ, Dept Informat Syst, Coll Comp & Informat Sci, Riyadh 11451, Saudi Arabia

来源：

JOURNAL OF BIG DATA | 2022年 / 9卷 / 01期

关键词：

Sentiment analysis; Language models; NLP; XLNet; AraXLNet; Text mining; NEURAL-NETWORK;

D O I：

10.1186/s40537-022-00625-z

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The Arabic language is a complex language with little resources; therefore, its limitations create a challenge to produce accurate text classification tasks such as sentiment analysis. The main goal of sentiment analysis is to determine the overall orientation of a given text in terms of whether it is positive, negative, or neutral. Recently, language models have shown great results in promoting the accuracy of text classification in English. The models are pre-trained on a large dataset and then fine-tuned on the downstream tasks. Particularly, XLNet has achieved state-of-the-art results for diverse natural language processing (NLP) tasks in English. In this paper, we hypothesize that such parallel success can be achieved in Arabic. The paper aims to support this hypothesis by producing the first XLNet-based language model in Arabic called AraXLNet, demonstrating its use in Arabic sentiment analysis in order to improve the prediction accuracy of such tasks. The results showed that the proposed model, AraXLNet, with Farasa segmenter achieved an accuracy results of 94.78%, 93.01%, and 85.77% in sentiment analysis task for Arabic using multiple benchmark datasets. This result outperformed AraBERT that obtained 84.65%, 92.13%, and 85.05% on the same datasets, respectively. The improved accuracy of the proposed model was evident using multiple benchmark datasets, thus offering promising advancement in the Arabic text classification tasks.

引用

页数：21

共 50 条

[1] AraXLNet: pre-trained language model for sentiment analysis of Arabic
Alhanouf Alduailej
Abdulrahman Alothaim
[J]. Journal of Big Data, 9
[2] Leveraging Pre-trained Language Model for Speech Sentiment Analysis
Shon, Suwon
Brusco, Pablo
Pan, Jing
Han, Kyu J.
Watanabe, Shinji
[J]. INTERSPEECH 2021, 2021, : 3420 - 3424
[3] A Comparative Study of Pre-trained Word Embeddings for Arabic Sentiment Analysis
Zouidine, Mohamed
Khalil, Mohammed
[J]. 2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 1243 - 1248
[4] Aspect Based Sentiment Analysis by Pre-trained Language Representations
Liang Tianxin
Yang Xiaoping
Zhou Xibo
Wang Bingqian
[J]. 2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1262 - 1265
[5] TwitterBERT: Framework for Twitter Sentiment Analysis Based on Pre-trained Language Model Representations
Azzouza, Noureddine
Akli-Astouati, Karima
Ibrahim, Roliana
[J]. EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 : 428 - 437
[6] Comparing Pre-Trained Language Model for Arabic Hate Speech Detection
Daouadi, Kheir Eddine
Boualleg, Yaakoub
Guehairia, Oussama
[J]. COMPUTACION Y SISTEMAS, 2024, 28 (02): : 681 - 693
[7] Enhancing Turkish Sentiment Analysis Using Pre-Trained Language Models
Koksal, Omer
[J]. 29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
[8] Incorporating Dynamic Semantics into Pre-Trained Language Model for Aspect-based Sentiment Analysis
Zhang, Kai
Zhang, Kun
Zhang, Mengdi
Zhao, Hongke
Liu, Qi
Wu, Wei
Chen, Enhong
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3599 - 3610
[9] Sentiment Analysis Using Pre-Trained Language Model With No Fine-Tuning and Less Resource
Kit, Yuheng
Mokji, Musa Mohd
[J]. IEEE ACCESS, 2022, 10 : 107056 - 107065
[10] An Entity-Level Sentiment Analysis of Financial Text Based on Pre-Trained Language Model
Huang, Zhihong
Fang, Zhijian
[J]. 2020 IEEE 18TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), VOL 1, 2020, : 391 - 396

← 1 2 3 4 5 →