Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset

被引:0
|
作者
Yen, Show-Jane [1 ]
Lee, Yue-Shi [1 ]
机构
[1] Ming Chuan Univ, Dept Comp Sci & Informat Engn, Taoyuan 333, Taiwan
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The most important factor of classification for improving classification accuracy is the training data. However, the data in real-world applications often are imbalanced class distribution, that is, most of the data are in majority class and little data are in minority class. In this case, if all the data are used to be the training data, the classifier tends to predict that most of the incoming data belong to the majority class. Hence, it is important to select the suitable training data for classification in the imbalanced class distribution problem. In this paper, we propose cluster-based under-sampling approaches for selecting the representative data as training data to improve the classification accuracy for minority class in the imbalanced class distribution problem. The experimental results show that our cluster-based under-sampling approaches outperform the other under-sampling techniques in the previous studies.
引用
收藏
页码:731 / 740
页数:10
相关论文
共 50 条
  • [1] An Under-sampling Method Based on Fuzzy Logic for Large Imbalanced Dataset
    Wong, Ginny Y.
    Leung, Frank H. F.
    Ling, Sai-Ho
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2014, : 1248 - 1252
  • [2] Improving Classification of Imbalanced Student Dataset Using Ensemble Method of Voting, Bagging, and Adaboost with Under-Sampling Technique
    Punlumjeak, Wattana
    Rugtanom, Sitti
    Jantarat, Samatachai
    Rachburee, Nachirat
    [J]. IT CONVERGENCE AND SECURITY 2017, VOL 1, 2018, 449 : 27 - 34
  • [3] Cluster-based under-sampling approaches for imbalanced data distributions
    Yen, Show-Jane
    Lee, Yue-Shi
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5718 - 5727
  • [4] A K-means Clustering Based Under-Sampling Method for Imbalanced Dataset Classification
    Huang, Chih-Ming
    Hung, Chuan-Sheng
    Hsu, Yao-Yuan
    Zheng, You-Cheng
    Yu, Cheng-Han
    Lin, Chun-Hung Richard
    Chen, Shi-Huang
    [J]. 38TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN 2024, 2024, : 708 - 713
  • [5] A Cluster-Based Under-Sampling Algorithm for Class-Imbalanced Data
    Guzman-Ponce, A.
    Valdovinos, R. M.
    Sanchez, J. S.
    [J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2020, 2020, 12344 : 299 - 311
  • [6] Under-sampling class imbalanced datasets by combining clustering analysis and instance selection
    Tsai, Chih-Fong
    Lin, Wei-Chao
    Hu, Ya-Han
    Yao, Guan-Ting
    [J]. INFORMATION SCIENCES, 2019, 477 : 47 - 54
  • [7] An Improved Under-sampling Imbalanced Classification Algorithm
    Yao, Baofeng
    Wang, Lei
    [J]. 2021 13TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2021), 2021, : 775 - 779
  • [8] Multilabel Over-sampling and Under-sampling with Class Alignment for Imbalanced Multilabel Text Classification
    Taha, Adil Yaseen
    Tiun, Sabrina
    Abd Rahman, Abdul Hadi
    Sabah, Ali
    [J]. JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2021, 20 (03): : 423 - 456
  • [9] Boosting the performance of over-sampling algorithms through under-sampling the minority class
    de Morais, Romero F. A. B.
    Vasconcelos, Germano C.
    [J]. NEUROCOMPUTING, 2019, 343 : 3 - 18
  • [10] Improving Classification Performance for the Minority Class in Highly Imbalanced Dataset using Boosting
    Abouelenien, Mohamed
    Yuan, Xiaohui
    Duraisamy, Prakash
    Yuan, Xiaojing
    [J]. 2012 THIRD INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION & NETWORKING TECHNOLOGIES (ICCCNT), 2012,