CLUS: A New Hybrid Sampling Classification for Imbalanced Data

被引:0
|
作者
Prachuabsupakij, Wanthanee [1 ]
机构
[1] King Mongkuts Univ Technol North Bangkok, Fac Ind Technol & Management, Dept Informat Technol, Prachin Buri, Thailand
关键词
imbalanced data; classification; clustering; sampling; data mining;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The new hybrid sampling approach called CLUS- CLUSter-based hybrid sampling approach is proposed in this paper to improve the performance of classifier for two-class imbalanced datasets. The objective of this research is to develop algorithm that can effectively classify two-class imbalanced datasets, which have complicated distributions and large overlap between classes. These problems can make the learners failed in classification. Therefore, the contribution of CLUS is to alleviate the large overlap between classes and to balance the class distribution. Firstly, all instances are partitioned into k clusters using k-mean algorithms. Next, CLUS created the new subset, which consists of the instances from different classes, which have different characteristics. Secondly, for each subset, oversampling method is applied. Finally, SVMs is used to classify each training set based on majority vote. CLUS is tested using eight imbalanced benchmark datasets and assessed over two metrics; F-measure and AUC. The experimental results show that CLUS outperforms other methods especially when the number of imbalanced ratio is high.
引用
收藏
页码:281 / 286
页数:6
相关论文
共 50 条
  • [1] A Hybrid Sampling SVM Approach to Imbalanced Data Classification
    Wang, Qiang
    [J]. ABSTRACT AND APPLIED ANALYSIS, 2014,
  • [2] A New Hybrid Sampling Approach for Classification of Imbalanced Datasets
    Hanskunatai, Anantaporn
    [J]. PROCEEDINGS OF 2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS), 2018, : 67 - 71
  • [3] Hybrid sampling for imbalanced data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    [J]. PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 202 - 207
  • [4] Hybrid sampling for imbalanced data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    [J]. INTEGRATED COMPUTER-AIDED ENGINEERING, 2009, 16 (03) : 193 - 210
  • [5] HSDLM: A Hybrid Sampling With Deep Learning Method for Imbalanced Data Classification
    Hasib, Khan Md
    Towhid, Nurul Akter
    Islam, Md Rafiqul
    [J]. INTERNATIONAL JOURNAL OF CLOUD APPLICATIONS AND COMPUTING, 2021, 11 (04) : 1 - 13
  • [6] A cluster-based hybrid sampling approach for imbalanced data classification
    Feng, Shou
    Zhao, Chunhui
    Fu, Ping
    [J]. REVIEW OF SCIENTIFIC INSTRUMENTS, 2020, 91 (05):
  • [7] A New Hybrid Under-sampling Approach to Imbalanced Classification Problems
    Peng, Chun-Yang
    Park, You-Jin
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)
  • [8] A Hybrid Sampling Method for Imbalanced Data
    Gazzah, Sami
    Hechkel, Amina
    Ben Amara, Najoua Essoukri
    [J]. 2015 IEEE 12TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2015,
  • [9] A hybrid sampling method for highly imbalanced and overlapped data classification with complex distribution
    Liu, Yansong
    Zhu, Li
    Ding, Lei
    Sui, He
    Shang, Wenli
    [J]. INFORMATION SCIENCES, 2024, 661
  • [10] A New Sampling Approach for Classification of Imbalanced Data sets with High Density
    Jia Pengfei
    Zhang Chunkai
    He Zhenyu
    [J]. 2014 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2014, : 217 - 222