A Partial Least Squares Algorithm for Microarray Data Analysis Using the VIP Statistic for Gene Selection and Binary Classification

被引:0
|
作者
Burguillo, Francisco J. [1 ]
Corchete, Luis A. [1 ]
Martin, Javier [2 ]
Barrera, Inmaculada [2 ]
Bardsley, William G. [3 ]
机构
[1] Univ Salamanca, Fac Farm, Dept Quim Fis, Salamanca 37080, Spain
[2] Univ Salamanca, Fac Med, Dept Estadist, Salamanca 37080, Spain
[3] Univ Manchester, Sch Biol Sci, Manchester M13 9PL, Lancs, England
关键词
Classification; gene selection; microarray; partial least squares; PLS; VIP statistic; VARIABLE SELECTION; EXPLANATORY VARIABLES; PLS; REGRESSION;
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
An important application of microarray technology is the assignment of new subjects to known clinical groups (class prediction), but the huge number of screened genes and the small number of samples make this task difficult. To overcome this problem, the usual approach has been to extract a small subset of significant genes (gene selection) or to use the whole set of genes to build latent components (dimension reduction), then applying some usual multivariate classification procedure. Alternatively, both aims -gene selection and class prediction-can be achieved at the same time by using methods based on Partial Least Squares (PLS), as reported in the present work. We present an iterative PLS algorithm based on backward variable elimination through the "Variable Influence on Projection" (VIP) statistic, which finds an optimal PLS model through training and test sets. It simultaneously manages to reduce the number of selected genes by an iterative procedure and finds the best number of PLS factors to reach an optimal classification performance. It is a simple approach that uses only one mathematical method, maintains the identification of discriminatory genes, and builds an optimal predicting model with a fast computation. The algorithm runs as a module of the SIMFIT statistical package, where the optimal model and datasets can be re-run to further interpret the system through additional PLS options, such as scores and loadings plots, or class assignment of new samples. The proposed algorithm was tested under different scenarios occurring in microarray analysis using simulated data. The results are also compared against different classification methods such as KNN, PAM, SVM, RF and standard PLS.
引用
收藏
页码:348 / 359
页数:12
相关论文
共 50 条
  • [1] Tumor classification by partial least squares using microarray gene expression data
    Nguyen, DV
    Rocke, DM
    [J]. BIOINFORMATICS, 2002, 18 (01) : 39 - 50
  • [2] Kernelized partial least squares for feature reduction and classification of gene microarray data
    Land, Walker H.
    Qiao, Xingye
    Margolis, Daniel E.
    Ford, William S.
    Paquette, Christopher T.
    Perez-Rogers, Joseph F.
    Borgia, Jeffrey A.
    Yang, Jack Y.
    Deng, Youping
    [J]. BMC SYSTEMS BIOLOGY, 2011, 5
  • [3] Partial least squares classification for high dimensional data using the PCOUT algorithm
    Turkmen, Asuman
    Billor, Nedret
    [J]. COMPUTATIONAL STATISTICS, 2013, 28 (02) : 771 - 788
  • [4] Partial least squares classification for high dimensional data using the PCOUT algorithm
    Asuman Turkmen
    Nedret Billor
    [J]. Computational Statistics, 2013, 28 : 771 - 788
  • [5] Classification from microarray data using probabilistic discriminant partial least squares with reject option
    Botella, Cristina
    Ferre, Joan
    Boque, Ricard
    [J]. TALANTA, 2009, 80 (01) : 321 - 328
  • [6] Multi-class tumor classification by discriminant partial least squares using microarray gene expression data and assessment of classification models
    Tan, YX
    Shi, LB
    Tong, WD
    Hwang, GTG
    Wang, C
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2004, 28 (03) : 235 - 244
  • [7] A Comparative Study of Two Multiple Classification Methods Based on Partial Least Squares Using Tumor Microarray Gene Expression Data
    Jin Zhichao
    Gao Qingbin
    He Jia
    [J]. COMPREHENSIVE EVALUATION OF ECONOMY AND SOCIETY WITH STATISTICAL SCIENCE, 2009, : 1212 - 1222
  • [8] Variable selection using genetic algorithm for analysis of near-infrared spectral data using partial least squares
    Soh, Chit Siang
    Ong, Kok Meng
    Raveendran, P.
    [J]. 2005 27TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2005, : 1178 - 1181
  • [9] Naive Bayes combined with partial least squares for classification of high dimensional microarray data
    Mehmood, Tahir
    Kanwal, Arzoo
    Butt, Muhammad Moeen
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2022, 222
  • [10] Partial least squares based dimension reduction with gene selection for tumor classification
    Li, Guo-Zheng
    Zeng, Xue-Qiang
    Yang, Jack Y.
    Yang, Mary Qu
    [J]. PROCEEDINGS OF THE 7TH IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, VOLS I AND II, 2007, : 1439 - +