Cost-sensitive Feature Selection for Support Vector Machines

被引:35
|
作者
Benitez-Pena, S. [1 ,2 ]
Blanquero, R. [1 ,2 ]
Carrizosa, E. [1 ,2 ]
Ramirez-Cobo, P. [1 ,3 ]
机构
[1] Univ Seville, IMUS, E-41012 Seville, Spain
[2] Univ Seville, Dept Estadist & Invest Operat, E-41012 Seville, Spain
[3] Univ Cadiz, Dept Estadist & Invest Operat, Cadiz 11510, Spain
关键词
Classification; Data Science; Support Vector Machines; Feature Selection; Integer Programming; Sparsity; OPERATIONS-RESEARCH; CLASSIFICATION; SYNERGIES;
D O I
10.1016/j.cor.2018.03.005
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Feature Selection is a crucial procedure in Data Science tasks such as Classification, since it identifies the relevant variables, making thus the classification procedures more interpretable, cheaper in terms of measurement and more effective by reducing noise and data overfit. The relevance of features in a classification procedure is linked to the fact that misclassifications costs are frequently asymmetric, since false positive and false negative cases may have very different consequences. However, off-the-shelf Feature Selection procedures seldom take into account such cost-sensitivity of errors. In this paper we propose a mathematical-optimization-based Feature Selection procedure embedded in one of the most popular classification procedures, namely, Support Vector Machines, accommodating asymmetric misclassification costs. The key idea is to replace the traditional margin maximization by minimizing the number of features selected, but imposing upper bounds on the false positive and negative rates. The problem is written as an integer linear problem plus a quadratic convex problem for Support Vector Machines with both linear and radial kernels. The reported numerical experience demonstrates the usefulness of the proposed Feature Selection procedure. Indeed, our results on benchmark data sets show that a substantial decrease of the number of features is obtained, whilst the desired trade-off between false positive and false negative rates is achieved. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:169 / 178
页数:10
相关论文
共 50 条
  • [31] Feature selection for support vector machines in text categorization
    Liu, Y
    Lu, HM
    Lu, ZX
    Wang, P
    MLMTA'03: INTERNATIONAL CONFERENCE ON MACHINE LEARNING; MODELS, TECHNOLOGIES AND APPLICATIONS, 2003, : 129 - 134
  • [32] Feature selection using support vector machines.
    Brank, J
    Grobelnik, M
    Milic-Frayling, N
    Mladenic, D
    DATA MINING III, 2002, 6 : 261 - 273
  • [33] Feature selection for support vector machines with RBF kernel
    Quanzhong Liu
    Chihau Chen
    Yang Zhang
    Zhengguo Hu
    Artificial Intelligence Review, 2011, 36 : 99 - 115
  • [34] An empirical study of feature selection in support vector machines
    Cao, L. J.
    Zhang Jingqing
    NEURAL NETWORK WORLD, 2006, 16 (05) : 433 - 453
  • [35] Feature selection for support vector machines with RBF kernel
    Liu, Quanzhong
    Chen, Chihau
    Zhang, Yang
    Hu, Zhengguo
    ARTIFICIAL INTELLIGENCE REVIEW, 2011, 36 (02) : 99 - 115
  • [36] AUC Maximizing Support Vector Machines with Feature Selection
    Tian, Yingjie
    Shi, Yong
    Chen, Xiaojun
    Chen, Wenjing
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS), 2011, 4 : 1691 - 1698
  • [37] Cost-Sensitive Spam Detection Using Parameters Optimization and Feature Selection
    Lee, Sang Min
    Kim, Dong Seong
    Park, Jong Sou
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2011, 17 (06) : 944 - 960
  • [38] Prostate Cancer Localization With Multispectral MRI Using Cost-Sensitive Support Vector Machines and Conditional Random Fields
    Artan, Yusuf
    Haider, Masoom A.
    Langer, Deanna L.
    van der Kwast, Theodorus H.
    Evans, Andrew J.
    Yang, Yongyi
    Wernick, Miles N.
    Trachtenberg, John
    Yetik, Imam Samil
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2010, 19 (09) : 2444 - 2455
  • [39] Rough sets and Laplacian score based cost-sensitive feature selection
    Yu, Shenglong
    Zhao, Hong
    PLOS ONE, 2018, 13 (06):
  • [40] Cost-sensitive and sequential feature selection for chiller fault detection and diagnosis
    Yan, Ke
    Ma, Lulu
    Dai, Yuting
    Shen, Wen
    Ji, Zhiwei
    Xie, Dongqing
    INTERNATIONAL JOURNAL OF REFRIGERATION, 2018, 86 : 401 - 409