CLUS: A New Hybrid Sampling Classification for Imbalanced Data

被引:0
|
作者
Prachuabsupakij, Wanthanee [1 ]
机构
[1] King Mongkuts Univ Technol North Bangkok, Fac Ind Technol & Management, Dept Informat Technol, Prachin Buri, Thailand
关键词
imbalanced data; classification; clustering; sampling; data mining;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The new hybrid sampling approach called CLUS- CLUSter-based hybrid sampling approach is proposed in this paper to improve the performance of classifier for two-class imbalanced datasets. The objective of this research is to develop algorithm that can effectively classify two-class imbalanced datasets, which have complicated distributions and large overlap between classes. These problems can make the learners failed in classification. Therefore, the contribution of CLUS is to alleviate the large overlap between classes and to balance the class distribution. Firstly, all instances are partitioned into k clusters using k-mean algorithms. Next, CLUS created the new subset, which consists of the instances from different classes, which have different characteristics. Secondly, for each subset, oversampling method is applied. Finally, SVMs is used to classify each training set based on majority vote. CLUS is tested using eight imbalanced benchmark datasets and assessed over two metrics; F-measure and AUC. The experimental results show that CLUS outperforms other methods especially when the number of imbalanced ratio is high.
引用
收藏
页码:281 / 286
页数:6
相关论文
共 50 条
  • [21] An empirical evaluation of sampling methods for the classification of imbalanced data
    Kim, Misuk
    Hwang, Kyu-Baek
    [J]. PLOS ONE, 2022, 17 (07):
  • [22] Aided Selection of Sampling Methods for Imbalanced Data Classification
    Sahni, Deep
    Pappu, Satya Jayadev
    Bhatt, Nirav
    [J]. CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 198 - 202
  • [23] A Hybrid Approach for Binary Classification of Imbalanced Data
    Tsai, Hsinhan
    Yang, Ta-Wei
    Wong, Wai-Man
    Kao, Han-Yi
    Chou, Cheng-Fu
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2024, 23 (03)
  • [24] Classification of Imbalanced Data Sets by Using the Hybrid Re-sampling Algorithm Based on Isomap
    Gu, Qiong
    Cai, Zhihua
    Zhu, Li
    [J]. ADVANCES IN COMPUTATION AND INTELLIGENCE, PROCEEDINGS, 2009, 5821 : 287 - +
  • [25] A New Combination Sampling Method for Imbalanced Data
    Li, Hu
    Zou, Peng
    Wang, Xiang
    Xia, Rongze
    [J]. PROCEEDINGS OF 2013 CHINESE INTELLIGENT AUTOMATION CONFERENCE: INTELLIGENT INFORMATION PROCESSING, 2013, 256 : 547 - 554
  • [26] Imbalanced Data Stream Classification Using Hybrid Data Preprocessing
    Bobowska, Barbara
    Klikowski, Jakub
    Wozniak, Michal
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 402 - 413
  • [27] Imbalanced data classification: Using transfer learning and active sampling
    Liu, Yang
    Yang, Guoping
    Qiao, Shaojie
    Liu, Meiqi
    Qu, Lulu
    Han, Nan
    Wu, Tao
    Yuan, Guan
    Peng, Yuzhong
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 117
  • [28] Combine Sampling Support Vector Machine for Imbalanced Data Classification
    Sain, Hartayuni
    Purnami, Santi Wulan
    [J]. THIRD INFORMATION SYSTEMS INTERNATIONAL CONFERENCE 2015, 2015, 72 : 59 - 66
  • [29] DOSS: Dual Over Sampling Strategy for Imbalanced Data Classification
    Wang, Qiushi
    Lee, Kee Jin
    Hong, Jihoon
    [J]. IECON 2018 - 44TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2018, : 5389 - 5394
  • [30] An Active Under-sampling Approach for Imbalanced Data Classification
    Yang, Zeping
    Gao, Daqi
    [J]. 2012 FIFTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2012), VOL 2, 2012, : 270 - 273