Ensemble Feature Learning of Genomic Data Using Support Vector Machine

被引:26
|
作者
Anaissi, Ali [1 ]
Goyal, Madhu [1 ]
Catchpoole, Daniel R. [2 ]
Braytee, Ali [1 ]
Kennedy, Paul J. [1 ]
机构
[1] Univ Technol Sydney, Ctr Quantum Computat & Intelligent Syst QCIS, Fac Engn & Informat Technol, Broadway, NSW 2007, Australia
[2] Childrens Hosp Westmead, Childrens Canc Res Unit, Tumour Bank, Locked Bag 4001, Westmead, NSW 2145, Australia
来源
PLOS ONE | 2016年 / 11卷 / 06期
关键词
GENE SELECTION; CLUSTERING ANALYSIS; MICROARRAY DATA; RANDOM FOREST; CLASSIFICATION; PATTERNS; TUMOR; BIAS;
D O I
10.1371/journal.pone.0157330
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Singer identification model using data augmentation and enhanced feature conversion with hybrid feature vector and machine learning
    Serhat Hizlisoy
    Recep Sinan Arslan
    Emel Çolakoğlu
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2024
  • [42] Singer identification model using data augmentation and enhanced feature conversion with hybrid feature vector and machine learning
    Hizlisoy, Serhat
    Arslan, Recep Sinan
    Colakoglu, Emel
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01)
  • [43] Data Classification with Support Vector Machine and Generalized Support Vector Machine
    Qi, Xiaomin
    Silvestrov, Sergei
    Nazir, Talat
    [J]. ICNPAA 2016 WORLD CONGRESS: 11TH INTERNATIONAL CONFERENCE ON MATHEMATICAL PROBLEMS IN ENGINEERING, AEROSPACE AND SCIENCES, 2017, 1798
  • [44] Support Vector Machine Ensemble Based on Feature and Hyperparameter Variation for Real-World Machine Fault Diagnosis
    Wandekoken, Estefhan Dazzi
    Varejao, Flavio M.
    Batista, Rodrigo
    Rauber, Thomas W.
    [J]. SOFT COMPUTING IN INDUSTRIAL APPLICATIONS, 2011, 96 : 271 - +
  • [45] When Ensemble Learning Meets Deep Learning: a New Deep Support Vector Machine for Classification
    Qi, Zhiquan
    Wang, Bo
    Tian, Yingjie
    Zhang, Peng
    [J]. KNOWLEDGE-BASED SYSTEMS, 2016, 107 : 54 - 60
  • [46] PV Forecasting Using Support Vector Machine Learning in a Big Data Analytics Context
    Preda, Stefan
    Oprea, Simona-Vasilica
    Bara, Adela
    Belciu , Anda
    [J]. SYMMETRY-BASEL, 2018, 10 (12):
  • [47] Preprocessing unbalanced data using support vector machine
    Farquad, M. A. H.
    Bose, Indranil
    [J]. DECISION SUPPORT SYSTEMS, 2012, 53 (01) : 226 - 233
  • [48] Big data Analytics Using Support Vector Machine
    Amudha, P.
    Sivakumari, S.
    [J]. IEEE INTERNATIONAL CONFERENCE ON SOFT-COMPUTING AND NETWORK SECURITY (ICSNS 2018), 2018, : 63 - +
  • [49] Classification of hyperspectral data using support vector machine
    Zhang, JP
    Zhang, Y
    Zhou, TX
    [J]. 2001 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2001, : 882 - 885
  • [50] Face recognition using feature optimization and ν-support vector learning
    Lu, JW
    Plataniotis, KN
    Venetsanopoulos, AN
    [J]. NEURAL NETWORKS FOR SIGNAL PROCESSING XI, 2001, : 373 - 382