Detection of Arabic offensive language in social media using machine learning models

被引:1
|
作者
Mousa, Aya [1 ]
Shahin, Ismail [1 ]
Nassif, Ali Bou [2 ]
Elnagar, Ashraf [3 ]
机构
[1] Univ Sharjah, Dept Elect Engn, Sharjah, U Arab Emirates
[2] Univ Sharjah, Dept Comp Engn, Sharjah, U Arab Emirates
[3] Univ Sharjah, Dept Comp Sci, Sharjah, U Arab Emirates
来源
关键词
Arabic text classification; Cascaded model; Machine learning; Multiclass detection; Offensive language;
D O I
10.1016/j.iswa.2024.200376
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This research aims to detect different types of Arabic offensive language in twitter. It uses a multiclass classification system in which each tweet is categorized into one or more of the offensive language types based on the used word(s). In this study, five types are classified, which are: bullying, insult, racism, obscene, and nonoffensive. To classify the abusive language, a cascaded model consisting of Bidirectional Encoder Representation of Transformers (BERT) models (AraBERT, ArabicBERT, XLMRoBERTa, GigaBERT, MBERT, and QARiB), deep learning models (1D-CNN, BiLSTM), and Radial Basis Function (RBF) is presented in this work. In addition, various types of machine learning models are utilized. The dataset is collected from twitter in which each class has the same number of tweets (balanced dataset). Each tweet is assigned to one or more of the selected offensive language types to build multiclass and multilabel systems. In addition, a binary dataset is constructed by assigning the tweets to offensive or non-offensive classes. The highest results are obtained from implementing the cascaded model started by ArabicBERT followed by BiLSTM and RBF with an accuracy, precision, recall, and F1score of 98.4%, 98.2%,92.8%, and 98.4%, respectively. RBF records the highest results among the utilized traditional classifiers with an accuracy, precision, recall, and F1-score of 60% for each measurement individually, while KNN records the lowest results obtaining 45%, 46%, 45%, and 43% in terms of accuracy, precision, recall, and F1-score, respectively.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Offensive Language Detection on Social Media using Machine Learning
    Abdrakhmanov, Rustam
    Kenesbayev, Serik Muktarovich
    Berkimbayev, Kamalbek
    Toikenov, Gumyrbek
    Abdrashova, Elmira
    Alchinbayeva, Oichagul
    Ydyrys, Aizhan
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (05) : 575 - 582
  • [2] Advancing offensive language detection in Arabic social media: a BERT-based ensemble learning approach
    Mazari, Ahmed Cherif
    Benterkia, Asmaa
    Takdenti, Zineb
    [J]. SOCIAL NETWORK ANALYSIS AND MINING, 2024, 14 (01)
  • [3] Comparative performance of ensemble machine learning for Arabic cyberbullying and offensive language detection
    Khairy, Marwa
    Mahmoud, Tarek M. M.
    Omar, Ahmed
    Abd El-Hafeez, Tarek
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2024, 58 (02) : 695 - 712
  • [4] Detection of Offensive Messages in Arabic Social Media Communications
    Mouheb, Djedjiga
    Ismail, Rutana
    Al Qaraghuli, Shaheen
    Al Aghbari, Zaher
    Kamel, Ibrahim
    [J]. PROCEEDINGS OF THE 2018 13TH INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION TECHNOLOGY (IIT), 2018, : 24 - 29
  • [5] Hate and offensive speech detection on Arabic social media
    Alsafari, Safa
    Sadaoui, Samira
    Mouhoub, Malek
    [J]. Online Social Networks and Media, 2020, 19
  • [6] Detecting Hateful and Offensive Speech in Arabic Social Media Using Transfer Learning
    Boulouard, Zakaria
    Ouaissa, Mariya
    Ouaissa, Mariyam
    Krichen, Moez
    Almutiq, Mutiq
    Gasmi, Karim
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (24):
  • [7] Offensive Language Detection in Nepali Social Media
    Niraula, Nobal B.
    Dulal, Saurab
    Koirala, Diwa
    [J]. WOAH 2021: THE 5TH WORKSHOP ON ONLINE ABUSE AND HARMS, 2021, : 67 - 75
  • [8] EFFECTIVE OFFENSIVE LANGUAGE DEDUCTION USING DEEP LEARNING IN SOCIAL MEDIA
    Adaikkan, Kalaivani
    Thenmozhi, Duraio
    [J]. REVUE ROUMAINE DES SCIENCES TECHNIQUES-SERIE ELECTROTECHNIQUE ET ENERGETIQUE, 2024, 69 (02): : 201 - 206
  • [9] Transfer Learning Across Arabic Dialects for Offensive Language Detection
    Husain, Fatemah
    Uzuner, Ozlem
    [J]. 2022 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2022), 2022, : 196 - 205
  • [10] A Survey of Offensive Language Detection for the Arabic Language
    Husain, Fatemah
    Uzuner, Ozlem
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (01)