A New Hybrid Sampling Approach for Classification of Imbalanced Datasets

被引:0
|
作者
Hanskunatai, Anantaporn [1 ]
机构
[1] King Mongkuts Inst Technol Ladkrabang, Dept Comp Sci, Adv Artificial Intelligence Res Lab, Bangkok 10520, Thailand
关键词
imbalanced dataset; SMOTE; DBSCAN; hybrid sampling; decision tree; naive bayes;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Nowadays it is an era of data driven. Many organizations around the world including bank, industry, commercial, and medical intend to extract knowledge from a huge of data. But in the real-word datasets, most of them occur class imbalance problems. This paper presents a new algorithm to handle an imbalanced classification. The proposed technique is a hybrid sampling approach which is the combination of a well know oversampling algorithm called SMOTE and the undersampling technique by removing the ambiguous instances from the majority class instances. The experimental results show that the new hybrid sampling method yields the better predictive performance in term of F-measure when compare with other sampling techniques. In addition, it can improve f-measure up to 59.73% and 412.26% when compare with the original dataset based on decision tree learning and naive bayes classifiers respectively.
引用
收藏
页码:67 / 71
页数:5
相关论文
共 50 条
  • [21] Improving SVM Classification on Imbalanced Datasets by Introducing a New Bias
    Haydemar Núñez
    Luis Gonzalez-Abril
    Cecilio Angulo
    [J]. Journal of Classification, 2017, 34 : 427 - 443
  • [22] Robust hybrid data-level sampling approach to handle imbalanced data during classification
    Kaur, Prabhjot
    Gosain, Anjana
    [J]. SOFT COMPUTING, 2020, 24 (20) : 15715 - 15732
  • [23] Exploratory parallel hybrid sampling framework for imbalanced data classification
    Zheng, Ming
    Zhao, Zhuo
    Wang, Fei
    Hu, Xiaowen
    Xu, Sheng
    Li, Wanggen
    Li, Tong
    [J]. Engineering Applications of Artificial Intelligence, 2024, 138
  • [24] Variable Importance Analysis in Imbalanced Datasets: A New Approach
    Ahrazem Dfuf, Ismael
    Forte Perez-Minayo, Joaquin
    Mira Mcwilliams, Jose Manuel
    Gonzalez Fernandez, Camino
    [J]. IEEE ACCESS, 2020, 8 : 127404 - 127430
  • [25] Classification of Antimicrobial Peptides with Imbalanced Datasets
    Camacho, Francy L.
    Torres, Rodrigo
    Ramos Pollan, Raul
    [J]. 11TH INTERNATIONAL SYMPOSIUM ON MEDICAL INFORMATION PROCESSING AND ANALYSIS, 2015, 9681
  • [26] Adaptive over-sampling method for classification with application to imbalanced datasets in aluminum electrolysis
    Huang, Zhaoke
    Yang, Chunhua
    Chen, Xiaofang
    Huang, Keke
    Xie, Yongfang
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (11): : 7183 - 7199
  • [27] An Over-sampling Method Based on Probability Density Estimation for Imbalanced Datasets Classification
    Cao, Lu
    Zhai, Yi-Kui
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION PROCESSING (ICIIP'16), 2016,
  • [28] Discrimination Aware Classification for Imbalanced Datasets
    Ristanoski, Goce
    Liu, Wei
    Bailey, James
    [J]. PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1529 - 1532
  • [29] Adaptive over-sampling method for classification with application to imbalanced datasets in aluminum electrolysis
    Zhaoke Huang
    Chunhua Yang
    Xiaofang Chen
    Keke Huang
    Yongfang Xie
    [J]. Neural Computing and Applications, 2020, 32 : 7183 - 7199
  • [30] An Active Under-sampling Approach for Imbalanced Data Classification
    Yang, Zeping
    Gao, Daqi
    [J]. 2012 FIFTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2012), VOL 2, 2012, : 270 - 273