Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification

被引:166
|
作者
Maldonado, Sebastian [1 ]
Lopez, Julio [2 ]
机构
[1] Univ Los Andes, Fac Ingn & Ciencias Aplicadas, Monsenor Alvaro del Portillo 12455, Santiago, Chile
[2] Univ Diego Portales, Fac Ingn & Ciencias, Ejercito 441, Santiago, Chile
关键词
Feature selection; Support Vector Data Description; Cost-sensitive learning; Embedded approaches; Imbalanced data classification; MICROARRAY DATA; SUPPORT; CARCINOMAS; SURVIVAL;
D O I
10.1016/j.asoc.2018.02.051
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we propose a novel feature selection approach designed to deal with two major issues in machine learning, namely class-imbalance and high dimensionality. The proposed embedded strategy penalizes the cardinality of the feature set via the scaling factors technique, and is used with two support vector machine (SVM) formulations designed to deal with the class-imbalanced problem, namely Cost Sensitive SVM, and Support Vector Data Description. The proposed concave formulations are solved via a Quasi-Newton update and Armijo line search. We performed experiments on 12 highly imbalanced microarray datasets using linear and Gaussian kernel, achieving the highest average predictive performance with our approach compared with the most well-known feature selection strategies. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:94 / 105
页数:12
相关论文
共 50 条
  • [1] Research on classification method of high-dimensional class-imbalanced datasets based on SVM
    Zhang, Chunkai
    Zhou, Ying
    Guo, Jianwei
    Wang, Guoquan
    Wang, Xuan
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (07) : 1765 - 1778
  • [2] Research on classification method of high-dimensional class-imbalanced datasets based on SVM
    Chunkai Zhang
    Ying Zhou
    Jianwei Guo
    Guoquan Wang
    Xuan Wang
    [J]. International Journal of Machine Learning and Cybernetics, 2019, 10 : 1765 - 1778
  • [3] Online feature selection for high-dimensional class-imbalanced data
    Zhou, Peng
    Hu, Xuegang
    Li, Peipei
    Wu, Xindong
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 136 : 187 - 199
  • [4] Research On Classification Method Of High-Dimensional Class-Imbalanced Data Sets Based On SVM
    Zhang, Chunkai
    Guo, Jianwei
    Lu, Junru
    [J]. 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC), 2017, : 60 - 67
  • [5] An iterative SVM approach to feature selection and classification in high-dimensional datasets
    Liu, Dehua
    Qian, Hui
    Dai, Guang
    Zhang, Zhihua
    [J]. PATTERN RECOGNITION, 2013, 46 (09) : 2531 - 2537
  • [7] Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines
    Maldonado, Sebastian
    Weber, Richard
    Famili, Fazel
    [J]. INFORMATION SCIENCES, 2014, 286 : 228 - 246
  • [8] Class prediction for high-dimensional class-imbalanced data
    Blagus, Rok
    Lusa, Lara
    [J]. BMC BIOINFORMATICS, 2010, 11 : 523
  • [9] Class-imbalanced classifiers for high-dimensional data
    Lin, Wei-Jiun
    Chen, James J.
    [J]. BRIEFINGS IN BIOINFORMATICS, 2013, 14 (01) : 13 - 26
  • [10] SMOTE for high-dimensional class-imbalanced data
    Rok Blagus
    Lara Lusa
    [J]. BMC Bioinformatics, 14