A New Hybrid Sampling Approach for Classification of Imbalanced Datasets

被引:0
|
作者
Hanskunatai, Anantaporn [1 ]
机构
[1] King Mongkuts Inst Technol Ladkrabang, Dept Comp Sci, Adv Artificial Intelligence Res Lab, Bangkok 10520, Thailand
关键词
imbalanced dataset; SMOTE; DBSCAN; hybrid sampling; decision tree; naive bayes;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Nowadays it is an era of data driven. Many organizations around the world including bank, industry, commercial, and medical intend to extract knowledge from a huge of data. But in the real-word datasets, most of them occur class imbalance problems. This paper presents a new algorithm to handle an imbalanced classification. The proposed technique is a hybrid sampling approach which is the combination of a well know oversampling algorithm called SMOTE and the undersampling technique by removing the ambiguous instances from the majority class instances. The experimental results show that the new hybrid sampling method yields the better predictive performance in term of F-measure when compare with other sampling techniques. In addition, it can improve f-measure up to 59.73% and 412.26% when compare with the original dataset based on decision tree learning and naive bayes classifiers respectively.
引用
收藏
页码:67 / 71
页数:5
相关论文
共 50 条
  • [31] FuzzyCSampling: A Hybrid fuzzy c-means clustering sampling strategy for imbalanced datasets
    Maras, Abdullah
    Selcukcan Erol, Cigdem
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2023, 31 (07) : 1223 - 1236
  • [32] Study on source of classification in imbalanced datasets based on new ensemble classifier
    Zhai Y.
    Yang B.-R.
    Qu W.
    Sui H.-F.
    [J]. Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2011, 33 (01): : 196 - 201
  • [33] HSDLM: A Hybrid Sampling With Deep Learning Method for Imbalanced Data Classification
    Hasib, Khan Md
    Towhid, Nurul Akter
    Islam, Md Rafiqul
    [J]. INTERNATIONAL JOURNAL OF CLOUD APPLICATIONS AND COMPUTING, 2021, 11 (04) : 1 - 13
  • [34] A New Loss Function for Traffic Classification Task on Dramatic Imbalanced Datasets
    Xu, Luyang
    Zhou, Xu
    Lin, Xifeng
    Ren, Yongmao
    Qin, Yifang
    Liu, Jun
    [J]. ICC 2020 - 2020 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2020,
  • [35] A GAN-based hybrid sampling method for imbalanced customer classification
    Zhu, Bing
    Pan, Xin
    vanden Broucke, Seppe
    Xiao, Jin
    [J]. INFORMATION SCIENCES, 2022, 609 : 1397 - 1411
  • [36] Hybrid sampling-based contrastive learning for imbalanced node classification
    Caixia Cui
    Jie Wang
    Wei Wei
    Jiye Liang
    [J]. International Journal of Machine Learning and Cybernetics, 2023, 14 : 989 - 1001
  • [37] Hybrid sampling-based contrastive learning for imbalanced node classification
    Cui, Caixia
    Wang, Jie
    Wei, Wei
    Liang, Jiye
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (03) : 989 - 1001
  • [38] Hybrid sampling for imbalanced data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    [J]. PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 202 - 207
  • [39] Hybrid sampling for imbalanced data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    [J]. INTEGRATED COMPUTER-AIDED ENGINEERING, 2009, 16 (03) : 193 - 210
  • [40] A Cost-Sensitive Based Approach for Improving Associative Classification on Imbalanced Datasets
    Waiyamai, Kitsana
    Suwannarattaphoom, Phoonperm
    [J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, MLDM 2014, 2014, 8556 : 31 - 42