Cluster-based sampling approaches to imbalanced data distributions

被引:0
|
作者
Yen, Show-Jane [1 ]
Lee, Yue-Shi [1 ]
机构
[1] Ming Chuan Univ, Dept Comp Sci & Informat Engn, Taoyuan 333, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For classification problem, the training data will significantly influence the classification accuracy. When the data set is highly unbalanced, classification algorithms tend to degenerate by assigning all cases to the most common outcome. Hence, it is important to select the suitable training data for classification in the imbalanced class distribution problem. In this paper, we propose cluster-based under-sampling approaches for selecting the representative data as training data to improve the classification accuracy in the imbalanced class distribution environment. The basic classification algorithm of neural network model is considered. The experimental results show that our cluster-based under-sampling approaches outperform the other under-sampling techniques in the previous studies.
引用
收藏
页码:427 / 436
页数:10
相关论文
共 50 条
  • [31] A Cluster-Based Boosting Algorithm for Bankruptcy Prediction in a Highly Imbalanced Dataset
    Le, Tuong
    Son, Le Hoang
    Minh Thanh Vo
    Lee, Mi Young
    Baik, Sung Wook
    SYMMETRY-BASEL, 2018, 10 (07):
  • [32] Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning
    Tyagi, Shivani
    Mittal, Sangeeta
    PROCEEDINGS OF RECENT INNOVATIONS IN COMPUTING, ICRIC 2019, 2020, 597 : 209 - 221
  • [33] Cluster-based visualisation of marketing data
    Lisboa, PJG
    Patel, S
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 552 - 558
  • [34] Sparsest Random Sampling for Cluster-Based Compressive Data Gathering in Wireless Sensor Networks
    Sun, Peng
    Wu, Liantao
    Wang, Zhibo
    Xiao, Ming
    Wang, Zhi
    IEEE ACCESS, 2018, 6 : 36383 - 36394
  • [35] Cluster-based analysis of FMRI data
    Heller, Ruth
    Stanley, Damian
    Yekutieli, Daniel
    Rubin, Nava
    Benjamini, Yoav
    NEUROIMAGE, 2006, 33 (02) : 599 - 608
  • [36] Distributed sampling design and data fusion for signal detection in cluster-based sensor networks
    Wang, Tsang-Yi
    Yu, Chao-Tang
    Tai, Chih-Hao
    2009 IEEE 70TH VEHICULAR TECHNOLOGY CONFERENCE FALL, VOLS 1-4, 2009, : 1542 - +
  • [37] Cluster-based data relabelling for classification
    Wan, Huan
    Wang, Hui
    Scotney, Bryan
    Liu, Jun
    Wei, Xin
    INFORMATION SCIENCES, 2023, 648
  • [38] A cluster-based data deduplication technology
    Tseng, Chuan-Mu
    Ciou, Jheng-Rong
    Liu, Tzong-Jye
    2014 SECOND INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2014, : 226 - 230
  • [39] Cluster-Based Data Oriented Hashing
    Chafik, Sanaa
    Daoudi, Imane
    El Yacoubi, Mounim A.
    El Ouardi, Hamid
    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 1037 - 1043
  • [40] Data Sampling Approaches with Severely Imbalanced Big Data for Medicare Fraud Detection
    Bauder, Richard A.
    Khoshgoftaar, Taghi M.
    Hasanin, Tawfiq
    2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2018, : 137 - 142