Detection of Arabic offensive language in social media using machine learning models

被引:1
|
作者
Mousa, Aya [1 ]
Shahin, Ismail [1 ]
Nassif, Ali Bou [2 ]
Elnagar, Ashraf [3 ]
机构
[1] Univ Sharjah, Dept Elect Engn, Sharjah, U Arab Emirates
[2] Univ Sharjah, Dept Comp Engn, Sharjah, U Arab Emirates
[3] Univ Sharjah, Dept Comp Sci, Sharjah, U Arab Emirates
来源
关键词
Arabic text classification; Cascaded model; Machine learning; Multiclass detection; Offensive language;
D O I
10.1016/j.iswa.2024.200376
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This research aims to detect different types of Arabic offensive language in twitter. It uses a multiclass classification system in which each tweet is categorized into one or more of the offensive language types based on the used word(s). In this study, five types are classified, which are: bullying, insult, racism, obscene, and nonoffensive. To classify the abusive language, a cascaded model consisting of Bidirectional Encoder Representation of Transformers (BERT) models (AraBERT, ArabicBERT, XLMRoBERTa, GigaBERT, MBERT, and QARiB), deep learning models (1D-CNN, BiLSTM), and Radial Basis Function (RBF) is presented in this work. In addition, various types of machine learning models are utilized. The dataset is collected from twitter in which each class has the same number of tweets (balanced dataset). Each tweet is assigned to one or more of the selected offensive language types to build multiclass and multilabel systems. In addition, a binary dataset is constructed by assigning the tweets to offensive or non-offensive classes. The highest results are obtained from implementing the cascaded model started by ArabicBERT followed by BiLSTM and RBF with an accuracy, precision, recall, and F1score of 98.4%, 98.2%,92.8%, and 98.4%, respectively. RBF records the highest results among the utilized traditional classifiers with an accuracy, precision, recall, and F1-score of 60% for each measurement individually, while KNN records the lowest results obtaining 45%, 46%, 45%, and 43% in terms of accuracy, precision, recall, and F1-score, respectively.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Abusive language detection from social media comments using conventional machine learning and deep learning approaches
    Akhter, Muhammad Pervez
    Jiangbin, Zheng
    Naqvi, Irfan Raza
    AbdelMajeed, Mohammed
    Zia, Tehseen
    [J]. MULTIMEDIA SYSTEMS, 2022, 28 (06) : 1925 - 1940
  • [32] Informational Query Detection on Social Media Posts in Bengali Language Using Machine Learning And Transfer Learning Techniques
    Rahman, Md. Atiqur
    Chowdhury, Sanjid Islam
    Rafan, Sadid
    Jannat, Nahian
    Aziz, Tahsin
    [J]. 2023 IEEE 13TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE, CCWC, 2023, : 458 - 464
  • [33] Abusive language detection from social media comments using conventional machine learning and deep learning approaches
    Muhammad Pervez Akhter
    Zheng Jiangbin
    Irfan Raza Naqvi
    Mohammed AbdelMajeed
    Tehseen Zia
    [J]. Multimedia Systems, 2022, 28 : 1925 - 1940
  • [34] Identifying Racist Social Media Comments in Sinhala Language Using Text Analytics Models with Machine Learning
    Dias, Dulan S.
    Welikala, Madhushi D.
    Dias, Naomal G. J.
    [J]. 2018 18TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER) CONFERENCE PROCEEDINGS, 2018, : 363 - 368
  • [35] Rumor Detection Using Machine Learning Techniques on Social Media
    Kumar, Akshi
    Sangwan, Saurabh Raj
    [J]. INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 213 - 221
  • [36] Social Media Bot Detection Using Machine Learning Approach
    Bhongale, Prathamesh
    Sali, Om
    Mehetre, Shraddha
    [J]. ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2022, PT II, 2023, 1798 : 205 - 216
  • [37] Detection of Cyberbullying on Social Media Platforms Using Machine Learning
    Ali, Mohammad Usmaan
    Lefticaru, Raluca
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS, UKCI 2023, 2024, 1453 : 220 - 233
  • [38] Enhancing Detection of Arabic Social Spam Using Data Augmentation and Machine Learning
    Alkadri, Abdullah M.
    Elkorany, Abeer
    Ahmed, Cherry
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (22):
  • [39] Identifying comparative opinions in Arabic text in social media using machine learning techniques
    Alharbi, Fatmah Rasheed
    Khan, Muhammad Badruddin
    [J]. SN APPLIED SCIENCES, 2019, 1 (03)
  • [40] Identifying comparative opinions in Arabic text in social media using machine learning techniques
    Fatmah Rasheed Alharbi
    Muhammad Badruddin Khan
    [J]. SN Applied Sciences, 2019, 1