Comparing Pre-Trained Language Model for Arabic Hate Speech Detection

被引:0
|
作者
Daouadi, Kheir Eddine [1 ]
Boualleg, Yaakoub [1 ]
Guehairia, Oussama [2 ]
机构
[1] Echahid Cheikh Larbi Tebessi Univ, Lab Vis & Artificial Intelligence, Tebessa, Algeria
[2] Mohamed Khider Univ Biskra, Fac Sci & Technol, Biskra, Algeria
来源
COMPUTACION Y SISTEMAS | 2024年 / 28卷 / 02期
关键词
Arabic hate speech detection; fine-tuning; transfer learning; AraBERT; BOT;
D O I
10.13053/CyS-28-2-4130
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Today, hate speech detection from Arabic tweets attracts the attention of several researchers around the world. Different classification approaches have been proposed as a result of these research efforts. However, two of the main challenges confronted in this context are the use of handcrafted features and the fact that their performance rate is still limited. In this paper, we address the task of Arabic hate speech identification on Twitter and provide a deeper understanding of the capabilities of new techniques based on machine learning. In particular, we compare the performance of traditional machine learning methods with recently pre-trained language models based on Transfer Learning as well as deep learning models. We conducted experiments on a benchmark dataset with a standard evaluation scenario. Experiments show that: the multidialectal pre-trained language models outperform monolingual and multilingual ones; the fine-tuning of pre-trained language models improves the accuracy results of hate speech detection from Arabic tweets. Our main contribution is the achievement of promising results in Arabic by applying multidialectal pre-trained language models trained on Twitter data.
引用
收藏
页码:681 / 693
页数:13
相关论文
共 50 条
  • [1] Comparing pre-trained language models for Spanish hate speech detection
    Miriam Plaza-del-Arco, Flor
    Dolores Molina-Gonzalez, M.
    Alfonso Urena-Lopez, L.
    Teresa Martin-Valdivia, M.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 166
  • [2] COVID-HateBERT: a Pre-trained Language Model for COVID-19 related Hate Speech Detection
    Li, Mingqi
    Liao, Song
    Okpala, Ebuka
    Tong, Max
    Costello, Matthew
    Cheng, Long
    Hu, Hongxin
    Luo, Feng
    [J]. 20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 233 - 238
  • [3] AraXLNet: pre-trained language model for sentiment analysis of Arabic
    Alduailej, Alhanouf
    Alothaim, Abdulrahman
    [J]. JOURNAL OF BIG DATA, 2022, 9 (01)
  • [4] AraXLNet: pre-trained language model for sentiment analysis of Arabic
    Alhanouf Alduailej
    Abdulrahman Alothaim
    [J]. Journal of Big Data, 9
  • [5] Leveraging Pre-trained Language Model for Speech Sentiment Analysis
    Shon, Suwon
    Brusco, Pablo
    Pan, Jing
    Han, Kyu J.
    Watanabe, Shinji
    [J]. INTERSPEECH 2021, 2021, : 3420 - 3424
  • [6] SPEECHCLIP: INTEGRATING SPEECH WITH PRE-TRAINED VISION AND LANGUAGE MODEL
    Shih, Yi-Jen
    Wang, Hsuan-Fu
    Chang, Heng-Jui
    Berry, Layne
    Lee, Hung-yi
    Harwath, David
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 715 - 722
  • [7] Combining multiple pre-trained models for hate speech detection in Bengali, Marathi, and Hindi
    Nandi, Arpan
    Sarkar, Kamal
    Mallick, Arjun
    De, Arkadeep
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (32) : 77733 - 77757
  • [8] Software Vulnerabilities Detection Based on a Pre-trained Language Model
    Xu, Wenlin
    Li, Tong
    Wang, Jinsong
    Duan, Haibo
    Tang, Yahui
    [J]. 2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 904 - 911
  • [9] Hyperbolic Pre-Trained Language Model
    Chen, Weize
    Han, Xu
    Lin, Yankai
    He, Kaichen
    Xie, Ruobing
    Zhou, Jie
    Liu, Zhiyuan
    Sun, Maosong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
  • [10] Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study
    Tekiroglu, Serra Sinem
    Bonaldi, Helena
    Fanton, Margherita
    Guerini, Marco
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3099 - 3114