A Deep Learning Framework for Automatic Detection of Hate Speech Embedded in Arabic Tweets

被引:0
|
作者
Rehab Duwairi
Amena Hayajneh
Muhannad Quwaider
机构
[1] Jordan University of Science and Technology,
关键词
Arabic hate speech; Neural networks; Automatic detection of hateful speech; Deep learning; Text mining; Twitter;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we investigate the ability of CNN, CNN-LSTM, and BiLSTM-CNN deep learning networks to automatically classify or discover hateful content posted on social media. These deep networks were trained and tested using ArHS dataset which consists of 9833 tweets that were annotated to suite hateful speech detection in Arabic. To the best of our knowledge, this is the largest Arabic dataset which handles the subclasses of hate speech. Moreover, we investigate the performance on two existing Arabic hate speech datasets along with ArHS dataset resulting in a combined dataset which consists of 23,678 tweets. Three types of experiment are reported: first, the binary classification of tweets into Hate or Normal, second, ternary classification of tweets into (Hate, Abusive, or Normal), and lastly, multi-class classification of tweets into (Misogyny, Racism, Religious Discrimination, Abusive, and Normal). Using the ArHS dataset, in the binary classification task, the CNN model outperformed other models and achieved an accuracy of 81%. In the ternary classification task, both the CNN and BiLSTM-CNN models achieved the best accuracy of 74%. Lastly, in the multi-class classification task, CNN-LSTM and the BiLSTM-CNN models both achieved the best results with an accuracy of 73%. On the Combined dataset, in the binary classification task, the BiLSTM-CNN achieved an accuracy of 73%. In the ternary classification task, BiLSTM-CNN achieved the best accuracy of 67%. Lastly, in the multi-class classification task, the CNN-LSTM and the BiLSTM-CNN achieved the best accuracy of 65%.
引用
收藏
页码:4001 / 4014
页数:13
相关论文
共 50 条
  • [41] Arabic hate speech detection system based on AraBERT
    Higher Institute of Computer, Science and Multimedia of Sfax, sfax, Tunisia
    不详
    Proc. IEEE Int. Conf. Cogn. Informatics Cogn. Comput. ICCI*CC, 2022, (208-213):
  • [42] Hate and offensive speech detection on Arabic social media
    Alsafari S.
    Sadaoui S.
    Mouhoub M.
    Online Social Networks and Media, 2020, 19
  • [43] Automatic Hate Speech Detection Using Deep Neural Networks and Word Embedding
    Ebenezer Ojo, Olumide
    Ta, Thang-Hoang
    Gelbukh, Alexander
    Calvo, Hiram
    Sidorov, Grigori
    Oluwayemisi Adebanji, Olaronke
    COMPUTACION Y SISTEMAS, 2022, 26 (02): : 1007 - 1013
  • [44] Automatic hate speech detection using killer natural language processing optimizing ensemble deep learning approach
    Al-Makhadmeh, Zafer
    Tolba, Amr
    COMPUTING, 2020, 102 (02) : 501 - 522
  • [45] Arabic spam tweets classification using deep learning
    Sanaa Kaddoura
    Suja A. Alex
    Maher Itani
    Safaa Henno
    Asma AlNashash
    D. Jude Hemanth
    Neural Computing and Applications, 2023, 35 : 17233 - 17246
  • [46] Automatic hate speech detection using killer natural language processing optimizing ensemble deep learning approach
    Zafer Al-Makhadmeh
    Amr Tolba
    Computing, 2020, 102 : 501 - 522
  • [47] Analysing Hate Speech against Migrants and Women through Tweets Using Ensembled Deep Learning Model
    Hasan, Asif
    Sharma, Tripti
    Khan, Azizuddin
    Hasan Ali Al-Abyadh, Mohammed
    Computational Intelligence and Neuroscience, 2022, 2022
  • [48] Deep feature fusion for hate speech detection: a transfer learning approach
    Dwivedy, Vishwajeet
    Roy, Pradeep Kumar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (23) : 36279 - 36301
  • [49] Application of Data Augmentation Techniques for Hate Speech Detection with Deep Learning
    Venturott, Ligia Iunes
    Ciarelli, Patrick Marques
    PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021), 2021, 12981 : 778 - 787
  • [50] Sentiment Analysis of Arabic Tweets using Deep Learning
    Heikal, Maha
    Torki, Marwan
    El-Makky, Nagwa
    ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 114 - 122