Detection of Arabic offensive language in social media using machine learning models

被引:2
|
作者
Mousa, Aya [1 ]
Shahin, Ismail [1 ]
Nassif, Ali Bou [2 ]
Elnagar, Ashraf [3 ]
机构
[1] Univ Sharjah, Dept Elect Engn, Sharjah, U Arab Emirates
[2] Univ Sharjah, Dept Comp Engn, Sharjah, U Arab Emirates
[3] Univ Sharjah, Dept Comp Sci, Sharjah, U Arab Emirates
来源
关键词
Arabic text classification; Cascaded model; Machine learning; Multiclass detection; Offensive language;
D O I
10.1016/j.iswa.2024.200376
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This research aims to detect different types of Arabic offensive language in twitter. It uses a multiclass classification system in which each tweet is categorized into one or more of the offensive language types based on the used word(s). In this study, five types are classified, which are: bullying, insult, racism, obscene, and nonoffensive. To classify the abusive language, a cascaded model consisting of Bidirectional Encoder Representation of Transformers (BERT) models (AraBERT, ArabicBERT, XLMRoBERTa, GigaBERT, MBERT, and QARiB), deep learning models (1D-CNN, BiLSTM), and Radial Basis Function (RBF) is presented in this work. In addition, various types of machine learning models are utilized. The dataset is collected from twitter in which each class has the same number of tweets (balanced dataset). Each tweet is assigned to one or more of the selected offensive language types to build multiclass and multilabel systems. In addition, a binary dataset is constructed by assigning the tweets to offensive or non-offensive classes. The highest results are obtained from implementing the cascaded model started by ArabicBERT followed by BiLSTM and RBF with an accuracy, precision, recall, and F1score of 98.4%, 98.2%,92.8%, and 98.4%, respectively. RBF records the highest results among the utilized traditional classifiers with an accuracy, precision, recall, and F1-score of 60% for each measurement individually, while KNN records the lowest results obtaining 45%, 46%, 45%, and 43% in terms of accuracy, precision, recall, and F1-score, respectively.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Identifying comparative opinions in Arabic text in social media using machine learning techniques
    Alharbi, Fatmah Rasheed
    Khan, Muhammad Badruddin
    SN APPLIED SCIENCES, 2019, 1 (03):
  • [42] Identifying comparative opinions in Arabic text in social media using machine learning techniques
    Fatmah Rasheed Alharbi
    Muhammad Badruddin Khan
    SN Applied Sciences, 2019, 1
  • [43] A Corpus of Turkish Offensive Language on Social Media
    Coltekin, Cagri
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6174 - 6184
  • [44] Toward Arabic social networks unmasking toxicity using machine learning and deep learning models
    Mezghani, Anis
    Elleuch, Mohamed
    Gasmi, Salwa
    Kherallah, Monji
    International Journal of Intelligent Systems Technologies and Applications, 2024, 22 (03) : 260 - 280
  • [45] A Dataset of Offensive Language in Kosovo Social Media
    Ajvazi, Adem
    Hardmeier, Christian
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1860 - 1869
  • [46] Pars-OFF: A Benchmark for Offensive Language Detection on Farsi Social Media
    Ataei, Taha Shangipour
    Darvishi, Kamyar
    Javdan, Soroush
    Pourdabiri, Amin
    Minaei-Bidgoli, Behrouz
    Pilehvar, Mohammad Taher
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (04) : 2787 - 2795
  • [47] On the effects of machine translation on offensive language detection
    Dmonte, Alphaeus
    Satapara, Shrey
    Alsudais, Rehab
    Ranasinghe, Tharindu
    Zampieri, Marcos
    SOCIAL NETWORK ANALYSIS AND MINING, 2025, 14 (01)
  • [48] Correction to: Abusive language detection from social media comments using conventional machine learning and deep learning approaches
    Muhammad Pervez Akhter
    Zheng Jiangbin
    Irfan Raza Naqvi
    Mohammed AbdelMajeed
    Tehseen Zia
    Multimedia Systems, 2023, 29 : 451 - 451
  • [49] Towards Accurate Detection of Offensive Language in Online Communication in Arabic
    Alakrot, Azalden
    Murray, Liam
    Nikolov, Nikola S.
    ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 315 - 320
  • [50] Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata
    De Souza, Gabriel Araujo
    Da Costa-Abreu, Marjory
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,