Integration of feature vector selection and support vector machine for classification of imbalanced data

被引:29
|
作者
Liu, Jie [1 ]
Zio, Enrico [2 ,3 ,4 ]
机构
[1] Beihang Univ, Sch Reliabil & Syst Engn, 37 Xueyuan Rd, Beijing, Peoples R China
[2] Politecn Milan, Energy Dept, Milan, Italy
[3] PSL Univ Paris, MINES ParisTech, Ctr Rech Risques & Crises CRC, Paris, France
[4] Kyung Hee Univ, Dept Nucl Engn, Seoul, South Korea
关键词
Classification; Feature Vector Selection; Imbalanced data; Support Vector Machine; Separability; CLASSIFIERS; RECOGNITION; MARGIN; MODEL; SVM;
D O I
10.1016/j.asoc.2018.11.045
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Support Vector Machine (SVM) has been widely developed for tackling classification problems. Imbalanced data exist in many practical classification problems where the minority class is usually the one of interest. Undersampling is a popular solution for such problems. However, it has the risk of losing useful information in the original data. At the same time, tuning the hyperparameters in SVM is also challenging. By analyzing the geometrical meaning of kernel methods, an approach is proposed in this paper that combines a modified Feature Vector Selection (FVS) method with maximal between-class separability and an easy-tuning version of SVM, i.e. Feature Vector Regression (FVR) proposed in our previous work. In this paper, the modified FVS method selects a small number of data points that can represent linearly all the dataset in the Reproducing Kernel Hilbert Space (RKHS) and the selected data points give also a maximal separability of the imbalanced data in RKHS. The FVR model is also solved analytically, as in least-squared SVM. The decision threshold for classification is optimized to maximize the predefined accuracy metric. Twenty-six imbalanced datasets are considered and comparisons are carried out with several SVM-based methods for imbalanced data. Statistical test shows the effectiveness of the proposed method. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:702 / 711
页数:10
相关论文
共 50 条
  • [31] Probabilistic Feature Selection and Classification Vector Machine
    Jiang, Bingbing
    Li, Chang
    de Rijke, Maarten
    Yao, Xin
    Chen, Huanhuan
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2019, 13 (02)
  • [32] Feature selection in the Laplacian support vector machine
    Lee, Sangjun
    Park, Changyi
    Koo, Ja-Yong
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (01) : 567 - 577
  • [33] A Semisupervised Feature Selection with Support Vector Machine
    Dai, Kun
    Yu, Hong-Yi
    Li, Qing
    JOURNAL OF APPLIED MATHEMATICS, 2013,
  • [34] On the Probability of Feature Selection in Support Vector Classification
    Liu, Qunfeng
    Yao, Lan
    2013 IEEE INTERNATIONAL CONFERENCE ON SERVICE OPERATIONS AND LOGISTICS, AND INFORMATICS (SOLI), 2013, : 334 - 339
  • [35] Genetic Support Vector Classification and Feature Selection
    Mejia-Guevaara, Ivan
    Kuri-Morales, Angel
    PROCEEDINGS OF THE SPECIAL SESSION OF THE SEVENTH MEXICAN INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE - MICAI 2008, 2008, : 75 - +
  • [36] An adaboost support vector machine ensemble method with integration of instance selection and feature selection
    Yang, Honghui
    Wang, Yun
    Sun, Jincai
    Dai, Jian
    Li, Ya'an
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2014, 48 (12): : 63 - 68
  • [37] Weighted support vector machine for extremely imbalanced data
    Mun, Jongmin
    Bang, Sungwan
    Kim, Jaeoh
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2025, 203
  • [38] Performance of Support Vector Machine in Imbalanced Data Set
    Novakovic, Jasmina
    Markovic, Suzana
    2020 19TH INTERNATIONAL SYMPOSIUM INFOTEH-JAHORINA (INFOTEH), 2020,
  • [39] A method for feature selection on microarray data using support vector machine
    Huang, Xiao Bing
    Tang, Jian
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4081 : 513 - 523
  • [40] An improved Support Vector Machine for the classification of imbalanced biological datasets
    Wang, Haiying
    Zheng, Huiru
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, PROCEEDINGS: WITH ASPECTS OF THEORETICAL AND METHODOLOGICAL ISSUES, 2008, 5226 : 63 - +