Exploring of clustering algorithm on class-imbalanced data

被引:0
|
作者
Li Xuan [1 ]
Chen Zhigang [1 ]
Yang Fan [1 ]
机构
[1] Xiamen Univ, Dept Automat, Xiamen 361005, Fujian, Peoples R China
关键词
Class-imbalanced Data; Clustering Algorithm; Imbalanced-ratios; CLASSIFICATION;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Imbalanced data distribution still remains an unsolved problem in data mining and machine learning. This paper introduces the problem of the class-imbalanced data in classification learning and naturally introduces it into the clustering learning since data clustering is an important and frequently used unsupervised learning method. In this paper, two verification methods based on two different aspects of original data are proposed to test and verify the influence of class-imbalanced data on clustering. Furthermore, we also conduct some experiments on different imbalanced-ratios to exploring its importance in clustering algorithm since is a very important factor for the performance in classification learning. Experimental results indicate that the class-imbalance of the dataset can seriously influence the final performance and efficiency of the clustering algorithm, and the higher the ratio, the higher the adverse effects of the clustering performance based on class-imbalanced data.
引用
下载
收藏
页码:89 / 93
页数:5
相关论文
共 50 条
  • [1] A Scalable Exemplar-Based Subspace Clustering Algorithm for Class-Imbalanced Data
    You, Chong
    Li, Chi
    Robinson, Daniel P.
    Vidal, Rene
    COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 : 68 - 85
  • [2] Clustering-based undersampling in class-imbalanced data
    Lin, Wei-Chao
    Tsai, Chih-Fong
    Hu, Ya-Han
    Jhang, Jing-Shang
    INFORMATION SCIENCES, 2017, 409 : 17 - 26
  • [3] An Incremental Clustering-Based Fault Detection Algorithm for Class-Imbalanced Process Data
    Kwak, Jueun
    Lee, Taehyung
    Kim, Chang Ouk
    IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2015, 28 (03) : 318 - 328
  • [4] A Cluster-Based Under-Sampling Algorithm for Class-Imbalanced Data
    Guzman-Ponce, A.
    Valdovinos, R. M.
    Sanchez, J. S.
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2020, 2020, 12344 : 299 - 311
  • [5] Class prediction for high-dimensional class-imbalanced data
    Blagus, Rok
    Lusa, Lara
    BMC BIOINFORMATICS, 2010, 11 : 523
  • [6] Class prediction for high-dimensional class-imbalanced data
    Rok Blagus
    Lara Lusa
    BMC Bioinformatics, 11
  • [7] Learning from class-imbalanced data: review of data driven methods and algorithm driven methods
    Huang, Cui Yin
    Dai, Hong Liang
    DATA SCIENCE IN FINANCE AND ECONOMICS, 2021, 1 (01): : 21 - 36
  • [8] Adversarial Kernel Sampling on Class-imbalanced Data Streams
    Yang, Peng
    Li, Ping
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 2352 - 2362
  • [9] Class-imbalanced classifiers for high-dimensional data
    Lin, Wei-Jiun
    Chen, James J.
    BRIEFINGS IN BIOINFORMATICS, 2013, 14 (01) : 13 - 26
  • [10] SMOTE for high-dimensional class-imbalanced data
    Rok Blagus
    Lara Lusa
    BMC Bioinformatics, 14