Detection of Arabic offensive language in social media using machine learning models

被引：2

作者：

Mousa, Aya ^{[1
]}

Shahin, Ismail ^{[1
]}

Nassif, Ali Bou ^{[2
]}

Elnagar, Ashraf ^{[3
]}

机构：

[1] Univ Sharjah, Dept Elect Engn, Sharjah, U Arab Emirates

[2] Univ Sharjah, Dept Comp Engn, Sharjah, U Arab Emirates

[3] Univ Sharjah, Dept Comp Sci, Sharjah, U Arab Emirates

来源：

INTELLIGENT SYSTEMS WITH APPLICATIONS | 2024年 / 22卷

关键词：

Arabic text classification; Cascaded model; Machine learning; Multiclass detection; Offensive language;

D O I：

10.1016/j.iswa.2024.200376

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This research aims to detect different types of Arabic offensive language in twitter. It uses a multiclass classification system in which each tweet is categorized into one or more of the offensive language types based on the used word(s). In this study, five types are classified, which are: bullying, insult, racism, obscene, and nonoffensive. To classify the abusive language, a cascaded model consisting of Bidirectional Encoder Representation of Transformers (BERT) models (AraBERT, ArabicBERT, XLMRoBERTa, GigaBERT, MBERT, and QARiB), deep learning models (1D-CNN, BiLSTM), and Radial Basis Function (RBF) is presented in this work. In addition, various types of machine learning models are utilized. The dataset is collected from twitter in which each class has the same number of tweets (balanced dataset). Each tweet is assigned to one or more of the selected offensive language types to build multiclass and multilabel systems. In addition, a binary dataset is constructed by assigning the tweets to offensive or non-offensive classes. The highest results are obtained from implementing the cascaded model started by ArabicBERT followed by BiLSTM and RBF with an accuracy, precision, recall, and F1score of 98.4%, 98.2%,92.8%, and 98.4%, respectively. RBF records the highest results among the utilized traditional classifiers with an accuracy, precision, recall, and F1-score of 60% for each measurement individually, while KNN records the lowest results obtaining 45%, 46%, 45%, and 43% in terms of accuracy, precision, recall, and F1-score, respectively.

引用

页数：13

共 50 条

[31] Offensive Language Detection in Spanish Social Media: Testing From Bag-of-Words to Transformers Models
Molero, Jose Maria
Perez-Martin, Jorge
Rodrigo, Alvaro
Penas, Anselmo
IEEE ACCESS, 2023, 11 : 95639 - 95652
[32] Offensive Language Detection on Online Social Networks using Hybrid Deep Learning Architecture
Kazbekova, Gulnur
Ismagulova, Zhuldyz
Kemelbekova, Zhanar
Tileubay, Sarsenkul
Baimurzayev, Boranbek
Bazarbayeva, Aizhan
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (11) : 793 - 805
[33] Informational Query Detection on Social Media Posts in Bengali Language Using Machine Learning And Transfer Learning Techniques
Rahman, Md. Atiqur
Chowdhury, Sanjid Islam
Rafan, Sadid
Jannat, Nahian
Aziz, Tahsin
2023 IEEE 13TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE, CCWC, 2023, : 458 - 464
[34] Abusive language detection from social media comments using conventional machine learning and deep learning approaches
Akhter, Muhammad Pervez
Jiangbin, Zheng
Naqvi, Irfan Raza
AbdelMajeed, Mohammed
Zia, Tehseen
MULTIMEDIA SYSTEMS, 2022, 28 (06) : 1925 - 1940
[35] Abusive language detection from social media comments using conventional machine learning and deep learning approaches
Muhammad Pervez Akhter
Zheng Jiangbin
Irfan Raza Naqvi
Mohammed AbdelMajeed
Tehseen Zia
Multimedia Systems, 2022, 28 : 1925 - 1940
[36] Identifying Racist Social Media Comments in Sinhala Language Using Text Analytics Models with Machine Learning
Dias, Dulan S.
Welikala, Madhushi D.
Dias, Naomal G. J.
2018 18TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER) CONFERENCE PROCEEDINGS, 2018, : 363 - 368
[37] Social Media Bot Detection Using Machine Learning Approach
Bhongale, Prathamesh
Sali, Om
Mehetre, Shraddha
ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2022, PT II, 2023, 1798 : 205 - 216
[38] Rumor Detection Using Machine Learning Techniques on Social Media
Kumar, Akshi
Sangwan, Saurabh Raj
INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 213 - 221
[39] Detection of Cyberbullying on Social Media Platforms Using Machine Learning
Ali, Mohammad Usmaan
Lefticaru, Raluca
ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS, UKCI 2023, 2024, 1453 : 220 - 233
[40] Enhancing Detection of Arabic Social Spam Using Data Augmentation and Machine Learning
Alkadri, Abdullah M.
Elkorany, Abeer
Ahmed, Cherry
APPLIED SCIENCES-BASEL, 2022, 12 (22):

← 1 2 3 4 5 →