Bystander Detection: Automatic Labeling Techniques using Feature Selection and Machine Learning

被引:0
|
作者
Gupta, Anamika [1 ]
Thakkar, Khushboo [1 ]
Bhasin, Veenu [2 ]
Tiwari, Aman [1 ]
Mathur, Vibhor [1 ]
机构
[1] Univ Delhi, SS Coll Business Studies, Delhi, India
[2] Univ Delhi, PGDAV Coll, Delhi, India
关键词
Bystanders; cyberbullying; machine learning; de-; fender; instigator; impartial; toxicity; twitter;
D O I
10.14569/IJACSA.2024.01501112
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A hostile or aggressive behavior on an online platform by an individual or a group of people is termed as cyberbullying. A bystander is the one who sees or knows about such incidences of cyberbullying. A defender who intervenes can mitigate the impact of bullying, an instigator who accomplices the bully, can add to the victim's suffering, and an impartial onlooker who remains neutral and observes the scenario without getting engaged. Studying the behavior of Bystanders role can help in shaping the scale and progression of bullying incidents. However, the lack of data hinders the research in this area. Recently, a dataset, CYBY23, of Twitter threads having main tweets and the replies of Bystanders was published on Kaggle in Oct 2023. The dataset has extracted features related to toxicity and sensitivity of the main tweets and reply tweets. The authors have got manual annotators to assign the labels of Bystanders' roles. Manually labeling bystanders' roles is a labor-intensive task which eventually raises the need to have an automatic labeling technique for identifying the Bystander role. In this work, we aim to suggest a machine-learning model with high efficiency for the automatic labeling of Bystanders. Initially, the dataset was re-sampled using SMOTE to make it a balanced dataset. Next, we experimented with 12 models using various feature engineering techniques. Best features were selected for further experimentation by removing highly correlated and less relevant features. The models were evaluated on the metrics of accuracy, precision, recall, and F1 score. We found that the Random Forest Classifier (RFC) model with a certain set of features is the highest scorer among all 12 models. The RFC model was further tested against various splits of training and test sets. The highest results were achieved using a training set of 85% and a test set of 15%, having 78.83% accuracy, 81.79% precision, 74.83% recall, and 79.45% F1 score. Automatic labeling proposed in this work, will help in scaling the dataset which will be useful for further studies related to cyberbullying.
引用
收藏
页码:1135 / 1143
页数:9
相关论文
共 50 条
  • [1] Osteoporosis Detection Using Machine Learning Techniques and Feature Selection
    Iliou, Theodoros
    Anagnostopoulos, Christos-Nikolaos
    Anastassopoulos, George
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2014, 23 (05)
  • [2] Review on intrusion detection using feature selection with machine learning techniques
    Kalimuthan, C.
    Renjit, J. Arokia
    MATERIALS TODAY-PROCEEDINGS, 2020, 33 : 3794 - 3802
  • [3] Enhancing malware detection with feature selection and scaling techniques using machine learning models
    Hasan, Rakibul
    Biswas, Barna
    Samiun, Md
    Saleh, Mohammad Abu
    Prabha, Mani
    Akter, Jahanara
    Joya, Fatema Haque
    Abdullah, Masuk
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [4] Automatic Feature Extraction and Selection For Machine Learning Based Intrusion Detection
    Liu, Jinjie
    Chung, Sun Sunnie
    2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 1400 - 1405
  • [5] Android malware detection applying feature selection techniques and machine learning
    Keyvanpour, Mohammad Reza
    Shirzad, Mehrnoush Barani
    Heydarian, Farideh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (06) : 9517 - 9531
  • [6] Android malware detection applying feature selection techniques and machine learning
    Mohammad Reza Keyvanpour
    Mehrnoush Barani Shirzad
    Farideh Heydarian
    Multimedia Tools and Applications, 2023, 82 : 9517 - 9531
  • [7] Automatic colorectal cancer detection using machine learning and deep learning based on feature selection in histopathological images
    Junaid, Hawkar Haji Said
    Daneshfar, Fatemeh
    Mohammad, Mahmud Abdulla
    Biomedical Signal Processing and Control, 2025, 107
  • [8] Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata
    De Souza, Gabriel Araujo
    Da Costa-Abreu, Marjory
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [9] A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning
    Khalid, Samina
    Khalil, Tehmina
    Nasreen, Shamila
    2014 SCIENCE AND INFORMATION CONFERENCE (SAI), 2014, : 372 - 378
  • [10] Detection of colon cancer based on microarray dataset using machine learning as a feature selection and classification techniques
    A. S. M. Shafi
    M. M. Imran Molla
    Julakha Jahan Jui
    Mohammad Motiur Rahman
    SN Applied Sciences, 2020, 2