Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis

被引:46
|
作者
Zhang, Jue [1 ,2 ]
Chen, Li [1 ]
机构
[1] Northwest Univ, Sch Informat & Technol, Xian, Shaanxi, Peoples R China
[2] Yulin Univ, Sch Informat Engn, Yulin, Peoples R China
基金
中国国家自然科学基金;
关键词
Breast cancer diagnosis; class-imbalance problem; sample selection; FEATURE-SELECTION; K-MEANS; CLASSIFIERS; ALGORITHM;
D O I
10.1080/24699322.2019.1649074
中图分类号
R61 [外科手术学];
学科分类号
摘要
To overcome the two-class imbalanced classification problem existing in the diagnosis of breast cancer, a hybrid of Random Over Sampling Example, K-means and Support vector machine (RK-SVM) model is proposed which is based on sample selection. Random Over Sampling Example (ROSE) is utilized to balance the dataset and further improve the diagnosis accuracy by Support Vector Machine (SVM). As there is one different sample selection factor via clustering that encourages selecting the samples near the class boundary. The purpose of clustering here is to reduce the risk of removing useful samples and improve the efficiency of sample selection. To test the performance of the new hybrid classifier, it is implemented on breast cancer datasets and the other three datasets from the University of California Irvine (UCI) machine learning repository, which are commonly used datasets in class imbalanced learning. The extensive experimental results show that our proposed hybrid method outperforms most of the competitive algorithms in term of G-mean and accuracy indices. Additionally, experimental results show that this method also performs superiorly for binary problems.
引用
收藏
页码:62 / 72
页数:11
相关论文
共 50 条
  • [1] EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification
    Hoang Lam Le
    Landa-Silva, Dario
    Galar, Mikel
    Garcia, Salvador
    Triguero, Isaac
    [J]. APPLIED SOFT COMPUTING, 2021, 101
  • [2] Combine Sampling Support Vector Machine for Imbalanced Data Classification
    Sain, Hartayuni
    Purnami, Santi Wulan
    [J]. THIRD INFORMATION SYSTEMS INTERNATIONAL CONFERENCE 2015, 2015, 72 : 59 - 66
  • [3] Clustering-Based Support Vector Machine (SVM) for Symptomatic Knee Osteoarthritis Severity Classification
    Halim, Husnir Nasyuha Abdul
    Azaman, Aizreena
    [J]. 2022 9TH INTERNATIONAL CONFERENCE ON BIOMEDICAL AND BIOINFORMATICS ENGINEERING, ICBBE 2022, 2022, : 140 - 146
  • [4] Breast Cancer Diagnosis Based on Support Vector Machine
    Gao, Shang
    Li, Hongmei
    [J]. 2012 2ND INTERNATIONAL CONFERENCE ON UNCERTAINTY REASONING AND KNOWLEDGE ENGINEERING (URKE), 2012, : 240 - 243
  • [5] An Adaptive Pre-clustering Support Vector Machine for Binary Imbalanced Classification
    Di, Zonglin
    Yao, Siya
    Kang, Qi
    Zhou, Mengchu
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 681 - 686
  • [6] Nonlinear clustering-based support vector machine for large data sets
    Wang, Yongqiao
    Zhang, Xun
    Wang, Souyang
    Lai, K. K.
    [J]. OPTIMIZATION METHODS & SOFTWARE, 2008, 23 (04): : 533 - 549
  • [7] Rotating Machinery Fault Diagnosis for Imbalanced Data Based on Fast Clustering Algorithm and Support Vector Machine
    Zhang, Xiaochen
    Jiang, Dongxiang
    Han, Te
    Wang, Nanfei
    Yang, Wenguang
    Yang, Yizhou
    [J]. JOURNAL OF SENSORS, 2017, 2017
  • [8] Combining Re-sampling with Twin Support Vector Machine for Imbalanced Data Classification
    Cao, Lu
    Shen, Hong
    [J]. 2016 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 2016, : 325 - 329
  • [9] A Distance-Based Weighted Undersampling Scheme for Support Vector Machines and its Application to Imbalanced Classification
    Kang, Qi
    Shi, Lei
    Zhou, MengChu
    Wang, XueSong
    Wu, Qidi
    Wei, Zhi
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (09) : 4152 - 4165
  • [10] Homogeneous Ensemble based Support Vector Machine in Breast Cancer Diagnosis
    El Ouassif, Bouchra
    Idri, Ali
    Hosni, Mohamed
    [J]. HEALTHINF: PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL. 5: HEALTHINF, 2021, : 352 - 360