CLUS: A New Hybrid Sampling Classification for Imbalanced Data

被引:0
|
作者
Prachuabsupakij, Wanthanee [1 ]
机构
[1] King Mongkuts Univ Technol North Bangkok, Fac Ind Technol & Management, Dept Informat Technol, Prachin Buri, Thailand
关键词
imbalanced data; classification; clustering; sampling; data mining;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The new hybrid sampling approach called CLUS- CLUSter-based hybrid sampling approach is proposed in this paper to improve the performance of classifier for two-class imbalanced datasets. The objective of this research is to develop algorithm that can effectively classify two-class imbalanced datasets, which have complicated distributions and large overlap between classes. These problems can make the learners failed in classification. Therefore, the contribution of CLUS is to alleviate the large overlap between classes and to balance the class distribution. Firstly, all instances are partitioned into k clusters using k-mean algorithms. Next, CLUS created the new subset, which consists of the instances from different classes, which have different characteristics. Secondly, for each subset, oversampling method is applied. Finally, SVMs is used to classify each training set based on majority vote. CLUS is tested using eight imbalanced benchmark datasets and assessed over two metrics; F-measure and AUC. The experimental results show that CLUS outperforms other methods especially when the number of imbalanced ratio is high.
引用
收藏
页码:281 / 286
页数:6
相关论文
共 50 条
  • [41] Hybrid sampling-based contrastive learning for imbalanced node classification
    Caixia Cui
    Jie Wang
    Wei Wei
    Jiye Liang
    [J]. International Journal of Machine Learning and Cybernetics, 2023, 14 : 989 - 1001
  • [42] Hybrid sampling-based contrastive learning for imbalanced node classification
    Cui, Caixia
    Wang, Jie
    Wei, Wei
    Liang, Jiye
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (03) : 989 - 1001
  • [43] Learning From Imbalanced Data With Deep Density Hybrid Sampling
    Liu, Chien-Liang
    Chang, Yu-Hua
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (11): : 7065 - 7077
  • [44] Hybrid probabilistic sampling with random subspace for imbalanced data learning
    Cao, Peng
    Zhao, Dazhe
    Zaiane, Osmar
    [J]. INTELLIGENT DATA ANALYSIS, 2014, 18 (06) : 1089 - 1108
  • [45] Hybrid Sampling Method for Overlap Region of ICS Imbalanced Data
    Gao, Bing
    Gu, Zhaojun
    Zhou, Jingxian
    Sui, He
    [J]. Computer Engineering and Applications, 2023, 59 (19) : 305 - 315
  • [46] A new Monte Carlo sampling method based on Gaussian Mixture Model for imbalanced data classification
    Chen, Gang
    Hou, Binjie
    Lei, Tiangang
    [J]. MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (10) : 17866 - 17885
  • [47] Improving the classification performance on imbalanced data sets via new hybrid parameterisation model
    Mohamad, Masurah
    Selamat, Ali
    Subroto, Imam Much
    Krejcar, Ondrej
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2021, 33 (07) : 787 - 797
  • [48] A New Improved Boosting for Imbalanced Data Classification
    Zhang, Zongtang
    Qiu, JiaXing
    Dai, Weiguo
    [J]. 2019 THE 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, CONTROL AND ROBOTICS (EECR 2019), 2019, 533
  • [49] HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition
    Chen, Liping
    Jiang, Jiabao
    Zhang, Yong
    [J]. COMPLEXITY, 2021, 2021
  • [50] Hybrid Sampling and Dynamic Weighting-Based Classification Method for Multi-Class Imbalanced Data Stream
    Han, Meng
    Li, Ang
    Gao, Zhihui
    Mu, Dongliang
    Liu, Shujuan
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (10):