The effect of rebalancing techniques on the classification performance in cyberbullying datasets

被引:0
|
作者
Marwa Khairy
Tarek M. Mahmoud
Tarek Abd-El-Hafeez
机构
[1] Minia University,Faculty of Computers and Information
[2] University of Sadat City,Faculty of Computers and Artificial Intelligence, Computer Science Department
[3] Minia University,Computer Science Department, Faculty of Science
[4] Deraya University,undefined
来源
关键词
Classification; Cyberbullying; Undersampling; Oversampling; SMOTE; TOMEK;
D O I
暂无
中图分类号
学科分类号
摘要
Cyberbullying detection systems rely increasingly on machine learning techniques. However, class imbalance in cyberbullying datasets, where the percentage of normal labeled classes is higher than that of abnormal labeled ones, presents a significant challenge for classification algorithms. This issue is particularly problematic in two-class datasets, where conventional machine learning methods tend to perform poorly on minority class samples due to the influence of the majority class. To address this problem, researchers have proposed various oversampling and undersampling techniques. In this paper, we investigate the effectiveness of such techniques in addressing class imbalance in cyberbullying datasets. We conduct an experimental study that involves a preprocessing step to enhance machine learning algorithm performance. We then examine the impact of imbalanced data on classification performance for four cyberbullying datasets. To study the classification performance on balanced cyberbullying datasets, we employ four resampling techniques, namely random undersampling, random oversampling, SMOTE, and SMOTE + TOMEK. We evaluate the impact of each rebalancing technique on classification performance using eight well-known classification algorithms. Our findings demonstrate that the performance of resampling techniques depends on the dataset size, imbalance ratio, and classifier used. The conducted experiments proved that there are no techniques that will always perform better the others.
引用
收藏
页码:1049 / 1065
页数:16
相关论文
共 50 条
  • [1] The effect of rebalancing techniques on the classification performance in cyberbullying datasets
    Khairy, Marwa
    Mahmoud, Tarek M.
    Abd-El-Hafeez, Tarek
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (03): : 1049 - 1065
  • [2] The effect of rebalancing on LDA in imbalanced classification
    Kim, Arlene K. H.
    Chung, Hyunwoo
    STAT, 2021, 10 (01):
  • [3] Study of Classification Techniques on Medical Datasets
    Singh, Girish Kumar
    Jain, Rahul K.
    Dubey, Prabhati
    COMPUTING, COMMUNICATION AND SIGNAL PROCESSING, ICCASP 2018, 2019, 810 : 557 - 565
  • [4] Convolutional Rebalancing Network for the Classification of Large Imbalanced Rice Pest and Disease Datasets in the Field
    Yang, Guofeng
    Chen, Guipeng
    Li, Cong
    Fu, Jiangfan
    Guo, Yang
    Liang, Hua
    FRONTIERS IN PLANT SCIENCE, 2021, 12
  • [5] Network traffic classification: Techniques, datasets, and challenges
    Ahmad Azab
    Mahmoud Khasawneh
    Saed Alrabaee
    KimKwang Raymond Choo
    Maysa Sarsour
    Digital Communications and Networks, 2024, 10 (03) : 676 - 692
  • [6] A STUDY OF UNSUPERVISED CLASSIFICATION TECHNIQUES FOR HYPERSPECTRAL DATASETS
    Yadav, Himanshi
    Candela, Alberto
    Wettergreen, David
    2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 2993 - 2996
  • [7] Comparison of classification techniques based on medical datasets
    Al-Joda, Alyaa Abdulhussein
    Abdullah, Enas Fadhil
    Alasadi, Suad A.
    INTERNATIONAL JOURNAL OF NONLINEAR ANALYSIS AND APPLICATIONS, 2021, 12 : 1957 - 1964
  • [8] Multiresolution techniques for the classification of bioimage and biometric datasets
    Chebira, Amina
    Kovacevic, Jelena
    WAVELETS XII, PTS 1 AND 2, 2007, 6701
  • [9] Network traffic classification: Techniques, datasets, and challenges
    Azab, Ahmad
    Khasawneh, Mahmoud
    Alrabaee, Saed
    Choo, Kim-Kwang Raymond
    Sarsour, Maysa
    DIGITAL COMMUNICATIONS AND NETWORKS, 2024, 10 (03) : 676 - 692