Bystander Detection: Automatic Labeling Techniques using Feature Selection and Machine Learning

被引：0

作者：

Gupta, Anamika ^{[1
]}

Thakkar, Khushboo ^{[1
]}

Bhasin, Veenu ^{[2
]}

Tiwari, Aman ^{[1
]}

Mathur, Vibhor ^{[1
]}

机构：

[1] Univ Delhi, SS Coll Business Studies, Delhi, India

[2] Univ Delhi, PGDAV Coll, Delhi, India

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2024年 / 15卷 / 01期

关键词：

Bystanders; cyberbullying; machine learning; de-; fender; instigator; impartial; toxicity; twitter;

D O I：

10.14569/IJACSA.2024.01501112

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

A hostile or aggressive behavior on an online platform by an individual or a group of people is termed as cyberbullying. A bystander is the one who sees or knows about such incidences of cyberbullying. A defender who intervenes can mitigate the impact of bullying, an instigator who accomplices the bully, can add to the victim's suffering, and an impartial onlooker who remains neutral and observes the scenario without getting engaged. Studying the behavior of Bystanders role can help in shaping the scale and progression of bullying incidents. However, the lack of data hinders the research in this area. Recently, a dataset, CYBY23, of Twitter threads having main tweets and the replies of Bystanders was published on Kaggle in Oct 2023. The dataset has extracted features related to toxicity and sensitivity of the main tweets and reply tweets. The authors have got manual annotators to assign the labels of Bystanders' roles. Manually labeling bystanders' roles is a labor-intensive task which eventually raises the need to have an automatic labeling technique for identifying the Bystander role. In this work, we aim to suggest a machine-learning model with high efficiency for the automatic labeling of Bystanders. Initially, the dataset was re-sampled using SMOTE to make it a balanced dataset. Next, we experimented with 12 models using various feature engineering techniques. Best features were selected for further experimentation by removing highly correlated and less relevant features. The models were evaluated on the metrics of accuracy, precision, recall, and F1 score. We found that the Random Forest Classifier (RFC) model with a certain set of features is the highest scorer among all 12 models. The RFC model was further tested against various splits of training and test sets. The highest results were achieved using a training set of 85% and a test set of 15%, having 78.83% accuracy, 81.79% precision, 74.83% recall, and 79.45% F1 score. Automatic labeling proposed in this work, will help in scaling the dataset which will be useful for further studies related to cyberbullying.

引用

页码：1135 / 1143

页数：9

共 50 条

[21] A New Feature Selection Method Based on Dragonfly Algorithm for Android Malware Detection Using Machine Learning Techniques
Guendouz, Mohamed
Amine, Abdelmalek
INTERNATIONAL JOURNAL OF INFORMATION SECURITY AND PRIVACY, 2023, 17 (01)
[22] Performance Analysis of Anomaly-Based Network Intrusion Detection Using Feature Selection and Machine Learning Techniques
Seniaray, Sumedha
Jindal, Rajni
WIRELESS PERSONAL COMMUNICATIONS, 2024, 138 (04) : 2321 - 2351
[23] Effective Feature Selection for Hybrid Wireless IoT Network Intrusion Detection Systems Using Machine Learning Techniques
Nivaashini, M.
Thangaraj, P.
Sountharrajan, S.
Suganya, E.
Soundariya, R.
AD HOC & SENSOR WIRELESS NETWORKS, 2021, 49 (3-4) : 175 - 206
[24] PermDroid a framework developed using proposed feature selection approach and machine learning techniques for Android malware detection
Mahindru, Arvind
Arora, Himani
Kumar, Abhinav
Gupta, Sachin Kumar
Mahajan, Shubham
Kadry, Seifedine
Kim, Jungeun
SCIENTIFIC REPORTS, 2024, 14 (01):
[25] Optimizing IoT Intrusion Detection Using Balanced Class Distribution, Feature Selection, and Ensemble Machine Learning Techniques
Musthafa, Muhammad Bisri
Huda, Samsul
Kodera, Yuta
Ali, Md. Arshad
Araki, Shunsuke
Mwaura, Jedidah
Nogami, Yasuyuki
SENSORS, 2024, 24 (13)
[26] Automatic Sarcasm Detection using feature selection
Dharwal, Paras
Choudhury, Tanupriya
Mittal, Rajat
Kumar, Praveen
PROCEEDINGS OF THE 2017 3RD INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2017, : 29 - 34
[27] Feature Selection using an SVM learning machine
El Ferchichi, Sabra
Laabedi, Kaouther
Zidi, Salah
Maouche, Salah
2009 3RD INTERNATIONAL CONFERENCE ON SIGNALS, CIRCUITS AND SYSTEMS (SCS 2009), 2009, : 485 - +
[28] INTRUSION DETECTION BASED ON MACHINE LEARNING AND FEATURE SELECTION
Alaoui, Souad
El Gonnouni, Amina
Lyhyaoui, Abdelouahid
MENDEL 2011 - 17TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING, 2011, : 199 - 206
[29] FEATURE SELECTION AND MACHINE LEARNING CLASSIFICATION FOR MALWARE DETECTION
Khammas, Ban Mohammed
Monemi, Alireza
Bassi, Joseph Stephen
Ismail, Ismahani
Nor, Sulaiman Mohd
Marsono, Muhammad Nadzir
JURNAL TEKNOLOGI, 2015, 77 (01):
[30] Optimizing intrusion detection using intelligent feature selection with machine learning model
Aljehane, Nojood O.
Mengash, Hanan A.
Hassine, Siwar B. H.
Alotaibi, Faiz A.
Salama, Ahmed S.
Abdelbagi, Sitelbanat
ALEXANDRIA ENGINEERING JOURNAL, 2024, 91 : 39 - 49

← 1 2 3 4 5 →