A Partial Least Squares Algorithm for Microarray Data Analysis Using the VIP Statistic for Gene Selection and Binary Classification

被引:0
|
作者
Burguillo, Francisco J. [1 ]
Corchete, Luis A. [1 ]
Martin, Javier [2 ]
Barrera, Inmaculada [2 ]
Bardsley, William G. [3 ]
机构
[1] Univ Salamanca, Fac Farm, Dept Quim Fis, Salamanca 37080, Spain
[2] Univ Salamanca, Fac Med, Dept Estadist, Salamanca 37080, Spain
[3] Univ Manchester, Sch Biol Sci, Manchester M13 9PL, Lancs, England
关键词
Classification; gene selection; microarray; partial least squares; PLS; VIP statistic; VARIABLE SELECTION; EXPLANATORY VARIABLES; PLS; REGRESSION;
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
An important application of microarray technology is the assignment of new subjects to known clinical groups (class prediction), but the huge number of screened genes and the small number of samples make this task difficult. To overcome this problem, the usual approach has been to extract a small subset of significant genes (gene selection) or to use the whole set of genes to build latent components (dimension reduction), then applying some usual multivariate classification procedure. Alternatively, both aims -gene selection and class prediction-can be achieved at the same time by using methods based on Partial Least Squares (PLS), as reported in the present work. We present an iterative PLS algorithm based on backward variable elimination through the "Variable Influence on Projection" (VIP) statistic, which finds an optimal PLS model through training and test sets. It simultaneously manages to reduce the number of selected genes by an iterative procedure and finds the best number of PLS factors to reach an optimal classification performance. It is a simple approach that uses only one mathematical method, maintains the identification of discriminatory genes, and builds an optimal predicting model with a fast computation. The algorithm runs as a module of the SIMFIT statistical package, where the optimal model and datasets can be re-run to further interpret the system through additional PLS options, such as scores and loadings plots, or class assignment of new samples. The proposed algorithm was tested under different scenarios occurring in microarray analysis using simulated data. The results are also compared against different classification methods such as KNN, PAM, SVM, RF and standard PLS.
引用
收藏
页码:348 / 359
页数:12
相关论文
共 50 条
  • [21] Feature selection for microarray data using least squares SVM and particle swarm optimization
    Tang, EK
    Suganthan, PN
    Yao, X
    [J]. Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, 2005, : 9 - 16
  • [22] Hybridization of Genetic and Quantum Algorithm for Gene Selection and Classification of Microarray Data
    Abderrahim, Allani
    Talbi, El-Ghazali
    Khaled, Mellouli
    [J]. 2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 2226 - +
  • [23] HYBRIDIZATION OF GENETIC AND QUANTUM ALGORITHM FOR GENE SELECTION AND CLASSIFICATION OF MICROARRAY DATA
    Abderrahim, Allani
    Talbi, El-Ghazali
    Khaled, Mellouli
    [J]. INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 2012, 23 (02) : 431 - 444
  • [24] A hybrid LDA and genetic algorithm for gene selection and classification of microarray data
    Bonilla Huerta, Edmundo
    Duval, Beatrice
    Hao, Jin-Kao
    [J]. NEUROCOMPUTING, 2010, 73 (13-15) : 2375 - 2383
  • [25] A Partial Least Squares based algorithm for parsimonious variable selection
    Mehmood, Tahir
    Martens, Harald
    Saebo, Solve
    Warringer, Jonas
    Snipen, Lars
    [J]. ALGORITHMS FOR MOLECULAR BIOLOGY, 2011, 6
  • [26] A Partial Least Squares based algorithm for parsimonious variable selection
    Tahir Mehmood
    Harald Martens
    Solve Sæbø
    Jonas Warringer
    Lars Snipen
    [J]. Algorithms for Molecular Biology, 6
  • [27] Principal balances of compositional data for regression and classification using partial least squares
    Nesrstova, V.
    Wilms, I.
    Palarea-Albaladejo, J.
    Filzmoser, P.
    Martin-Fernandez, J. A.
    Friedecky, D.
    Hron, K.
    [J]. JOURNAL OF CHEMOMETRICS, 2023, 37 (12)
  • [28] Spectral data classification using locally weighted partial least squares classifier
    Song, Weiran
    Wang, Hui
    Maguire, Paul
    Nibouche, Omar
    [J]. DATA SCIENCE AND KNOWLEDGE ENGINEERING FOR SENSING DECISION SUPPORT, 2018, 11 : 700 - 707
  • [29] Missing values estimation in microarray data with partial least squares regression
    Yang, Kun
    Li, Jianzhong
    Wang, Chaokun
    [J]. COMPUTATIONAL SCIENCE - ICCS 2006, PT 2, PROCEEDINGS, 2006, 3992 : 662 - 669
  • [30] Gene Subset Selection for Leukemia Classification Using Microarray Data
    Fajila, Mohamed Nisper Fathima
    [J]. CURRENT BIOINFORMATICS, 2019, 14 (04) : 353 - 358