Ensemble Feature Learning of Genomic Data Using Support Vector Machine

被引:26
|
作者
Anaissi, Ali [1 ]
Goyal, Madhu [1 ]
Catchpoole, Daniel R. [2 ]
Braytee, Ali [1 ]
Kennedy, Paul J. [1 ]
机构
[1] Univ Technol Sydney, Ctr Quantum Computat & Intelligent Syst QCIS, Fac Engn & Informat Technol, Broadway, NSW 2007, Australia
[2] Childrens Hosp Westmead, Childrens Canc Res Unit, Tumour Bank, Locked Bag 4001, Westmead, NSW 2145, Australia
来源
PLOS ONE | 2016年 / 11卷 / 06期
关键词
GENE SELECTION; CLUSTERING ANALYSIS; MICROARRAY DATA; RANDOM FOREST; CLASSIFICATION; PATTERNS; TUMOR; BIAS;
D O I
10.1371/journal.pone.0157330
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] A method for feature selection on microarray data using support vector machine
    Huang, Xiao Bing
    Tang, Jian
    [J]. DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4081 : 513 - 523
  • [2] FPGA Based Nonlinear Support Vector Machine Training Using an Ensemble Learning
    Bin Rabieah, Mudhar
    Bouganis, Christos-Savvas
    [J]. 2015 25TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, 2015,
  • [3] A support vector machine ensemble for cancer classification using gene expression data
    Liao, Chen
    Li, Shutao
    [J]. BIOINFORMATICS RESEARCH AND APPLICATIONS, PROCEEDINGS, 2007, 4463 : 488 - +
  • [4] Efficient genomic selection using ensemble learning and ensemble feature reduction
    Banerjee R.
    Marathi B.
    Singh M.
    [J]. Journal of Crop Science and Biotechnology, 2020, 23 (4) : 311 - 323
  • [5] Imbalanced classification using support vector machine ensemble
    Tian, Jiang
    Gu, Hong
    Liu, Wenqi
    [J]. NEURAL COMPUTING & APPLICATIONS, 2011, 20 (02): : 203 - 209
  • [6] Imbalanced classification using support vector machine ensemble
    Jiang Tian
    Hong Gu
    Wenqi Liu
    [J]. Neural Computing and Applications, 2011, 20 : 203 - 209
  • [7] Fraud detection using support vector machine ensemble
    Pang, SN
    Kim, D
    Bang, SY
    [J]. 8TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING, VOLS 1-3, PROCEEDING, 2001, : 1344 - 1349
  • [8] Face Recognition using Ensemble Support Vector Machine
    Dey, Aniruddha
    Chowdhury, Shiladitya
    Ghosh, Manas
    [J]. 2017 THIRD IEEE INTERNATIONAL CONFERENCE ON RESEARCH IN COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (ICRCICN), 2017, : 45 - 50
  • [9] Pattern classification using support vector machine ensemble
    Kim, HC
    Pang, SN
    Je, HM
    Kim, DJ
    Bang, SY
    [J]. 16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL II, PROCEEDINGS, 2002, : 160 - 163
  • [10] Novel selective support vector machine ensemble learning algorithm
    Tang, Yaohua
    Gao, Jinghuai
    Bao, Qianzong
    [J]. Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2008, 42 (10): : 1221 - 1225