A Hybrid Sampling Method for Imbalanced Data

被引:0
|
作者
Gazzah, Sami [1 ]
Hechkel, Amina [1 ]
Ben Amara, Najoua Essoukri [1 ]
机构
[1] Univ Sousse, Tunisia SAGE, Adv Syst Elect Engn, Natl Engn Sch Sousse, Sousse, Tunisia
关键词
Imbalanced data sets; Intra-class variations; Data analysis; Principal component analysis; One-against-all SVM; CLASSIFICATION;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the diversification of applications and the emergence of new trends in challenging applications such as in the computer vision domain, classical machine learning systems usually perform poorly while confronting two common problems: the training data of negative examples, which outnumber the positive ones, and the large intra-class variations. These problems lead to a drop in the system performances. In this work, we propose to improve the classification accuracy in the case of imbalanced training data by equally balancing a training data set using a hybrid approach which consists in over-sampling the minority class using a "SMOTE star topology", and under-sampling the majority class by removing instances that are considered less relevant. The feature vector deletion has been performed with respect to intra-class variations, based on the distribution criterion. The experimental results, achieved in biometric data, show that the proposed approach significantly improves the overall performances measured in terms of true-positive rate.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Robust hybrid data-level sampling approach to handle imbalanced data during classification
    Kaur, Prabhjot
    Gosain, Anjana
    [J]. SOFT COMPUTING, 2020, 24 (20) : 15715 - 15732
  • [42] An imbalanced data processing method based on hybrid CGAN and SMOTEENN
    Liu N.
    Zhu B.
    Yin Y.-C.
    Li X.-C.
    [J]. Kongzhi yu Juece/Control and Decision, 2023, 38 (09): : 2614 - 2621
  • [43] Imbalanced Data Classification Based on a Hybrid Resampling SVM Method
    Cao, Lu
    Zhai, Yikui
    [J]. IEEE 12TH INT CONF UBIQUITOUS INTELLIGENCE & COMP/IEEE 12TH INT CONF ADV & TRUSTED COMP/IEEE 15TH INT CONF SCALABLE COMP & COMMUN/IEEE INT CONF CLOUD & BIG DATA COMP/IEEE INT CONF INTERNET PEOPLE AND ASSOCIATED SYMPOSIA/WORKSHOPS, 2015, : 1533 - 1536
  • [44] Neighbourhood sampling in bagging for imbalanced data
    Blaszczynski, Jerzy
    Stefanowski, Jerzy
    [J]. NEUROCOMPUTING, 2015, 150 : 529 - 542
  • [45] EHSBoost: Enhancing ensembles for imbalanced data-sets by evolutionary hybrid-sampling
    Zhang, Chunkai
    Guo, Jianwei
    Qi, Changqing
    Jiang, Zoe L.
    Liao, Qing
    Yao, Lin
    Wang, Xuan
    [J]. 2017 INTERNATIONAL CONFERENCE ON SECURITY, PATTERN ANALYSIS, AND CYBERNETICS (SPAC), 2017, : 118 - 123
  • [46] HSNF: Hybrid sampling with two-step noise filtering for imbalanced data classification
    Duan, Lilong
    Xue, Wei
    Gu, Xiaolei
    Luo, Xiao
    He, Yongsheng
    [J]. INTELLIGENT DATA ANALYSIS, 2023, 27 (06) : 1573 - 1593
  • [47] AN IMBALANCED DATA CLASSIFICATION METHOD BASED ON AUTOMATIC CLUSTERING UNDER-SAMPLING
    Deng, Xiaoheng
    Zhong, Weijian
    Ren, Ju
    Zeng, Detian
    Zhang, Honggang
    [J]. 2016 IEEE 35TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2016,
  • [48] EmSM: Ensemble Mixed Sampling Method for Classifying Imbalanced Intrusion Detection Data
    Jung, Ilok
    Ji, Jaewon
    Cho, Changseob
    [J]. ELECTRONICS, 2022, 11 (09)
  • [49] A Progressive Sampling Method for Dual -Node Imbalanced Learning with Restricted Data Access
    Qiu, Yixuan
    Chen, Weitong
    Xu, Miao
    [J]. 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, ICDM 2023, 2023, : 508 - 517
  • [50] Imbalanced Data Set CSVM Classification Method Based on Cluster Boundary Sampling
    Li, Peng
    Liang, Tian-ge
    Zhang, Kai-hui
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2016, 2016