Cost-sensitive Feature Selection for Support Vector Machines

被引:35
|
作者
Benitez-Pena, S. [1 ,2 ]
Blanquero, R. [1 ,2 ]
Carrizosa, E. [1 ,2 ]
Ramirez-Cobo, P. [1 ,3 ]
机构
[1] Univ Seville, IMUS, E-41012 Seville, Spain
[2] Univ Seville, Dept Estadist & Invest Operat, E-41012 Seville, Spain
[3] Univ Cadiz, Dept Estadist & Invest Operat, Cadiz 11510, Spain
关键词
Classification; Data Science; Support Vector Machines; Feature Selection; Integer Programming; Sparsity; OPERATIONS-RESEARCH; CLASSIFICATION; SYNERGIES;
D O I
10.1016/j.cor.2018.03.005
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Feature Selection is a crucial procedure in Data Science tasks such as Classification, since it identifies the relevant variables, making thus the classification procedures more interpretable, cheaper in terms of measurement and more effective by reducing noise and data overfit. The relevance of features in a classification procedure is linked to the fact that misclassifications costs are frequently asymmetric, since false positive and false negative cases may have very different consequences. However, off-the-shelf Feature Selection procedures seldom take into account such cost-sensitivity of errors. In this paper we propose a mathematical-optimization-based Feature Selection procedure embedded in one of the most popular classification procedures, namely, Support Vector Machines, accommodating asymmetric misclassification costs. The key idea is to replace the traditional margin maximization by minimizing the number of features selected, but imposing upper bounds on the false positive and negative rates. The problem is written as an integer linear problem plus a quadratic convex problem for Support Vector Machines with both linear and radial kernels. The reported numerical experience demonstrates the usefulness of the proposed Feature Selection procedure. Indeed, our results on benchmark data sets show that a substantial decrease of the number of features is obtained, whilst the desired trade-off between false positive and false negative rates is achieved. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:169 / 178
页数:10
相关论文
共 50 条
  • [41] COST-SENSITIVE FEATURE SELECTION BASED ON LABEL SIGNIFICANCE AND POSITIVE REGION
    Huang, Jintao
    Qian, Wenbin
    Wu, Binglong
    Wang, Yinglong
    PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), 2019, : 403 - 409
  • [42] Fast and efficient lung disease classification using hierarchical one-against-all support vector machine and cost-sensitive feature selection
    Chang, Yongjun
    Kim, Namkug
    Lee, Youngjoo
    Lim, Jonghyuck
    Seo, Joon Beom
    Lee, Young Kyung
    COMPUTERS IN BIOLOGY AND MEDICINE, 2012, 42 (12) : 1157 - 1164
  • [43] A Cost-Sensitive Feature Selection Method for High-Dimensional Data
    An, Chaojie
    Zhou, Qifeng
    14TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND EDUCATION (ICCSE 2019), 2019, : 1089 - 1094
  • [44] A Cost-Sensitive Semi-Supervised Support Vector Machine Algorithm
    Han, Min
    Wang, Zhao
    Sun, Zhaoxu
    Xu, Yongli
    Jiang, Nan
    2012 THIRD INTERNATIONAL CONFERENCE ON THEORETICAL AND MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE (ICTMF 2012), 2013, 38 : 238 - 244
  • [45] Sequential Cost-Sensitive Feature Acquisition
    Contardo, Gabriella
    Denoyer, Ludovic
    Artieres, Thierry
    ADVANCES IN INTELLIGENT DATA ANALYSIS XV, 2016, 9897 : 284 - 294
  • [46] Cost-Sensitive Support Vector Machine for Semi-Supervised Learning
    Qi, Zhiquan
    Tian, Yingjie
    Shi, Yong
    Yu, Xiaodan
    2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2013, 18 : 1684 - 1689
  • [47] Experiments with cost-sensitive feature evaluation
    Robnik-Sikonja, M
    MACHINE LEARNING: ECML 2003, 2003, 2837 : 325 - 336
  • [48] Cost-sensitive feature acquisition and classification
    Ji, Shihao
    Carin, Lawrence
    PATTERN RECOGNITION, 2007, 40 (05) : 1474 - 1485
  • [49] Synchronized feature selection for Support Vector Machines with twin hyperplanes
    Maldonado, Sebastian
    Lopez, Julio
    KNOWLEDGE-BASED SYSTEMS, 2017, 132 : 119 - 128
  • [50] Feature Selection Based On Linear Twin Support Vector Machines
    Yang, Zhi-Min
    He, Jun-Yun
    Shao, Yuan-Hai
    FIRST INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, 2013, 17 : 1039 - 1046