Bystander Detection: Automatic Labeling Techniques using Feature Selection and Machine Learning

被引:0
|
作者
Gupta, Anamika [1 ]
Thakkar, Khushboo [1 ]
Bhasin, Veenu [2 ]
Tiwari, Aman [1 ]
Mathur, Vibhor [1 ]
机构
[1] Univ Delhi, SS Coll Business Studies, Delhi, India
[2] Univ Delhi, PGDAV Coll, Delhi, India
关键词
Bystanders; cyberbullying; machine learning; de-; fender; instigator; impartial; toxicity; twitter;
D O I
10.14569/IJACSA.2024.01501112
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A hostile or aggressive behavior on an online platform by an individual or a group of people is termed as cyberbullying. A bystander is the one who sees or knows about such incidences of cyberbullying. A defender who intervenes can mitigate the impact of bullying, an instigator who accomplices the bully, can add to the victim's suffering, and an impartial onlooker who remains neutral and observes the scenario without getting engaged. Studying the behavior of Bystanders role can help in shaping the scale and progression of bullying incidents. However, the lack of data hinders the research in this area. Recently, a dataset, CYBY23, of Twitter threads having main tweets and the replies of Bystanders was published on Kaggle in Oct 2023. The dataset has extracted features related to toxicity and sensitivity of the main tweets and reply tweets. The authors have got manual annotators to assign the labels of Bystanders' roles. Manually labeling bystanders' roles is a labor-intensive task which eventually raises the need to have an automatic labeling technique for identifying the Bystander role. In this work, we aim to suggest a machine-learning model with high efficiency for the automatic labeling of Bystanders. Initially, the dataset was re-sampled using SMOTE to make it a balanced dataset. Next, we experimented with 12 models using various feature engineering techniques. Best features were selected for further experimentation by removing highly correlated and less relevant features. The models were evaluated on the metrics of accuracy, precision, recall, and F1 score. We found that the Random Forest Classifier (RFC) model with a certain set of features is the highest scorer among all 12 models. The RFC model was further tested against various splits of training and test sets. The highest results were achieved using a training set of 85% and a test set of 15%, having 78.83% accuracy, 81.79% precision, 74.83% recall, and 79.45% F1 score. Automatic labeling proposed in this work, will help in scaling the dataset which will be useful for further studies related to cyberbullying.
引用
收藏
页码:1135 / 1143
页数:9
相关论文
共 50 条
  • [31] A Feature Selection Approach for Fall Detection Using Various Machine Learning Classifiers
    Tuan Minh Le
    Ly Van Tran
    Son Vu Truong Dao
    IEEE ACCESS, 2021, 9 : 115895 - 115908
  • [32] An Approach to Feature Selection in Intrusion Detection Systems Using Machine Learning Algorithms
    Kavitha, G.
    Elango, N. M.
    INTERNATIONAL JOURNAL OF E-COLLABORATION, 2020, 16 (04) : 48 - 58
  • [33] Stance detection using diverse feature sets based on machine learning techniques
    Ayyub, Kashif
    Iqbal, Saqib
    Nisar, Muhammad Wasif
    Ahmad, Saima Gulzar
    Munir, Ehsan Ullah
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (05) : 9721 - 9740
  • [34] Network Intrusion Detection using Supervised Machine Learning Technique with Feature Selection
    Abu Taher, Kazi
    Jisan, Billal Mohammed Yasin
    Rahman, Md. Mahbubur
    2019 1ST INTERNATIONAL CONFERENCE ON ROBOTICS, ELECTRICAL AND SIGNAL PROCESSING TECHNIQUES (ICREST), 2019, : 643 - 646
  • [35] Phishing Website Detection Using Machine Learning Classifiers Optimized by Feature Selection
    Mehanovic, Dzelila
    Kevric, Jasmin
    TRAITEMENT DU SIGNAL, 2020, 37 (04) : 563 - 569
  • [36] Prediction of Cardiovascular Disease by Feature Selection and Machine Learning Techniques
    Ranade, Aditya
    Pise, Nitin
    ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 2, AITA 2023, 2024, 844 : 457 - 472
  • [37] Combating Network Intrusions using Machine Learning Techniques with Multilevel Feature Selection Method
    Olayinka, Tosin Comfort
    Ugwu, Chukwuemeka Christian
    Okhuoya, Omoibu Joseph
    Adetunmbi, Adebayo Olusola
    Popoola, Olugbemiga Solomon
    2022 IEEE NIGERIA 4TH INTERNATIONAL CONFERENCE ON DISRUPTIVE TECHNOLOGIES FOR SUSTAINABLE DEVELOPMENT (IEEE NIGERCON), 2022, : 589 - 593
  • [38] Automatic Detection of Clickbait Headlines Using Semantic Analysis and Machine Learning Techniques
    Bronakowski, Mark
    Al-khassaweneh, Mahmood
    Al Bataineh, Ali
    APPLIED SCIENCES-BASEL, 2023, 13 (04):
  • [39] Automatic detection of coagulation and carbonization in laser applications using machine learning techniques
    Yucelbas, Sule
    LASER PHYSICS, 2020, 30 (09)
  • [40] Automatic Detection of Microlensing Events in the Galactic Bulge using Machine Learning Techniques
    Chu, Selina
    Wagstaff, Kiri L.
    Bryden, Geoffrey
    Shvartzvald, Yossi
    ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XXVIII, 2019, 523 : 127 - 130