Bystander Detection: Automatic Labeling Techniques using Feature Selection and Machine Learning

被引:0
|
作者
Gupta, Anamika [1 ]
Thakkar, Khushboo [1 ]
Bhasin, Veenu [2 ]
Tiwari, Aman [1 ]
Mathur, Vibhor [1 ]
机构
[1] Univ Delhi, SS Coll Business Studies, Delhi, India
[2] Univ Delhi, PGDAV Coll, Delhi, India
关键词
Bystanders; cyberbullying; machine learning; de-; fender; instigator; impartial; toxicity; twitter;
D O I
10.14569/IJACSA.2024.01501112
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A hostile or aggressive behavior on an online platform by an individual or a group of people is termed as cyberbullying. A bystander is the one who sees or knows about such incidences of cyberbullying. A defender who intervenes can mitigate the impact of bullying, an instigator who accomplices the bully, can add to the victim's suffering, and an impartial onlooker who remains neutral and observes the scenario without getting engaged. Studying the behavior of Bystanders role can help in shaping the scale and progression of bullying incidents. However, the lack of data hinders the research in this area. Recently, a dataset, CYBY23, of Twitter threads having main tweets and the replies of Bystanders was published on Kaggle in Oct 2023. The dataset has extracted features related to toxicity and sensitivity of the main tweets and reply tweets. The authors have got manual annotators to assign the labels of Bystanders' roles. Manually labeling bystanders' roles is a labor-intensive task which eventually raises the need to have an automatic labeling technique for identifying the Bystander role. In this work, we aim to suggest a machine-learning model with high efficiency for the automatic labeling of Bystanders. Initially, the dataset was re-sampled using SMOTE to make it a balanced dataset. Next, we experimented with 12 models using various feature engineering techniques. Best features were selected for further experimentation by removing highly correlated and less relevant features. The models were evaluated on the metrics of accuracy, precision, recall, and F1 score. We found that the Random Forest Classifier (RFC) model with a certain set of features is the highest scorer among all 12 models. The RFC model was further tested against various splits of training and test sets. The highest results were achieved using a training set of 85% and a test set of 15%, having 78.83% accuracy, 81.79% precision, 74.83% recall, and 79.45% F1 score. Automatic labeling proposed in this work, will help in scaling the dataset which will be useful for further studies related to cyberbullying.
引用
收藏
页码:1135 / 1143
页数:9
相关论文
共 50 条
  • [41] Using Regression Error Analysis and Feature Selection to Automatic Cluster Labeling
    Soares Silva, Lucia Emilia
    Machado, Vinicius Ponte
    Araujo, Sidiney Souza
    Alves de Lima, Bruno Vicente
    Souza Veras, Rodrigo de Melo
    PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021), 2021, 12981 : 376 - 388
  • [42] Effective combining of feature selection techniques for machine learning-enabled IoT intrusion detection
    Rahman, Md Arafatur
    Asyhari, A. Taufiq
    Wen, Ong Wei
    Ajra, Husnul
    Ahmed, Yussuf
    Anwar, Farhat
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (20) : 31381 - 31399
  • [43] Effective combining of feature selection techniques for machine learning-enabled IoT intrusion detection
    Md Arafatur Rahman
    A. Taufiq Asyhari
    Ong Wei Wen
    Husnul Ajra
    Yussuf Ahmed
    Farhat Anwar
    Multimedia Tools and Applications, 2021, 80 : 31381 - 31399
  • [44] Comparison of Multiple Feature Selection Techniques for Machine Learning-Based Detection of IoT Attacks
    Viet Anh Phan
    Jerabek, Jan
    Malina, Lukas
    19TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY, AND SECURITY, ARES 2024, 2024,
  • [45] Improving performance with hybrid feature selection and ensemble machine learning techniques for code smell detection
    Jain, Shivani
    Saha, Anju
    SCIENCE OF COMPUTER PROGRAMMING, 2021, 212
  • [46] Machine Learning and Ensemble Learning Techniques for Intrusion Detection Systems: A Performance Analysis Based on Feature Selection Methods
    Basarslan, Muhammet Sinan
    Turgut, Zeynep
    INTELLIGENT AND FUZZY SYSTEMS, VOL 3, INFUS 2024, 2024, 1090 : 117 - 124
  • [47] Performance of Machine Learning Techniques in Anomaly Detection with Basic Feature Selection Strategy - A Network Intrusion Detection System
    Pranto, Md Badiuzzaman
    Ratul, Md Hasibul Alam
    Rahman, Md Mahidur
    Diya, Ishrat Jahan
    Zahir, Zunayeed-Bin
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2022, 13 (01) : 36 - 44
  • [48] Automatic detection of keratoconus on Pentacam images using feature selection based on deep learning
    Firat, Murat
    Cankaya, Cem
    Cinar, Ahmet
    Tuncer, Taner
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2022, 32 (05) : 1548 - 1560
  • [49] Data anomaly detection with automatic feature selection and deep learning
    Jiang, Huachen
    Ge, Ensheng
    Wan, Chunfeng
    Li, Shu
    Quek, Ser Tong
    Yang, Kang
    Ding, Youliang
    Xue, Songtao
    STRUCTURES, 2023, 57
  • [50] Automatic detection of Feature Envy and Data Class code smells using machine learning
    Skipina, Milica
    Slivka, Jelena
    Luburic, Nikola
    Kovacevic, Aleksandar
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 243