Comparing Pre-Trained Language Model for Arabic Hate Speech Detection

被引：0

作者：

Daouadi, Kheir Eddine ^{[1
]}

Boualleg, Yaakoub ^{[1
]}

Guehairia, Oussama ^{[2
]}

机构：

[1] Echahid Cheikh Larbi Tebessi Univ, Lab Vis & Artificial Intelligence, Tebessa, Algeria

[2] Mohamed Khider Univ Biskra, Fac Sci & Technol, Biskra, Algeria

来源：

COMPUTACION Y SISTEMAS | 2024年 / 28卷 / 02期

关键词：

Arabic hate speech detection; fine-tuning; transfer learning; AraBERT; BOT;

D O I：

10.13053/CyS-28-2-4130

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Today, hate speech detection from Arabic tweets attracts the attention of several researchers around the world. Different classification approaches have been proposed as a result of these research efforts. However, two of the main challenges confronted in this context are the use of handcrafted features and the fact that their performance rate is still limited. In this paper, we address the task of Arabic hate speech identification on Twitter and provide a deeper understanding of the capabilities of new techniques based on machine learning. In particular, we compare the performance of traditional machine learning methods with recently pre-trained language models based on Transfer Learning as well as deep learning models. We conducted experiments on a benchmark dataset with a standard evaluation scenario. Experiments show that: the multidialectal pre-trained language models outperform monolingual and multilingual ones; the fine-tuning of pre-trained language models improves the accuracy results of hate speech detection from Arabic tweets. Our main contribution is the achievement of promising results in Arabic by applying multidialectal pre-trained language models trained on Twitter data.

引用

页码：681 / 693

页数：13

共 50 条

[1] Comparing pre-trained language models for Spanish hate speech detection
Miriam Plaza-del-Arco, Flor
Dolores Molina-Gonzalez, M.
Alfonso Urena-Lopez, L.
Teresa Martin-Valdivia, M.
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 166
[2] COVID-HateBERT: a Pre-trained Language Model for COVID-19 related Hate Speech Detection
Li, Mingqi
Liao, Song
Okpala, Ebuka
Tong, Max
Costello, Matthew
Cheng, Long
Hu, Hongxin
Luo, Feng
[J]. 20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 233 - 238
[3] AraXLNet: pre-trained language model for sentiment analysis of Arabic
Alduailej, Alhanouf
Alothaim, Abdulrahman
[J]. JOURNAL OF BIG DATA, 2022, 9 (01)
[4] AraXLNet: pre-trained language model for sentiment analysis of Arabic
Alhanouf Alduailej
Abdulrahman Alothaim
[J]. Journal of Big Data, 9
[5] Leveraging Pre-trained Language Model for Speech Sentiment Analysis
Shon, Suwon
Brusco, Pablo
Pan, Jing
Han, Kyu J.
Watanabe, Shinji
[J]. INTERSPEECH 2021, 2021, : 3420 - 3424
[6] SPEECHCLIP: INTEGRATING SPEECH WITH PRE-TRAINED VISION AND LANGUAGE MODEL
Shih, Yi-Jen
Wang, Hsuan-Fu
Chang, Heng-Jui
Berry, Layne
Lee, Hung-yi
Harwath, David
[J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 715 - 722
[7] Combining multiple pre-trained models for hate speech detection in Bengali, Marathi, and Hindi
Nandi, Arpan
Sarkar, Kamal
Mallick, Arjun
De, Arkadeep
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (32) : 77733 - 77757
[8] Software Vulnerabilities Detection Based on a Pre-trained Language Model
Xu, Wenlin
Li, Tong
Wang, Jinsong
Duan, Haibo
Tang, Yahui
[J]. 2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 904 - 911
[9] Hyperbolic Pre-Trained Language Model
Chen, Weize
Han, Xu
Lin, Yankai
He, Kaichen
Xie, Ruobing
Zhou, Jie
Liu, Zhiyuan
Sun, Maosong
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
[10] Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study
Tekiroglu, Serra Sinem
Bonaldi, Helena
Fanton, Margherita
Guerini, Marco
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3099 - 3114

← 1 2 3 4 5 →