A Weighted Principal Component Analysis and Its Application to Gene Expression Data

被引:28
|
作者
da Costa, Joaquim F. Pinto [1 ,2 ]
Alonso, Hugo [3 ,4 ,5 ]
Roque, Luis [6 ]
机构
[1] Univ Porto, Fac Ciencias, Dept Matemat, P-4169007 Oporto, Portugal
[2] Univ Porto CMUP, Ctr Matemat, Oporto, Portugal
[3] Univ Lusofona Porto, Fac Econ & Gestao, P-4000098 Oporto, Portugal
[4] Univ Aveiro, Dept Matemat, P-3810193 Aveiro, Portugal
[5] Univ Aveiro, CIDMA, Aveiro, Portugal
[6] Inst Super Engn Porto, Grp Invest Engn Conhecimento & Apoio Decisao GECA, P-4200072 Oporto, Portugal
关键词
Correlation; principal component analysis; support vector machines; microarray data; gene selection; LYMPH-NODE METASTASIS; RANK MEASURE; CANCER; CLASSIFICATION; MICROARRAYS; CARCINOMAS; PROGNOSIS; CENTROIDS; SURVIVAL; MODELS;
D O I
10.1109/TCBB.2009.61
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.
引用
下载
收藏
页码:246 / 252
页数:7
相关论文
共 50 条
  • [1] Application of Principal Component Analysis in Weighted Stacking of Seismic Data
    Xie, Jianyong
    Chen, Wei
    Zhang, Dong
    Zu, Shaohuan
    Chen, Yangkang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2017, 14 (08) : 1213 - 1217
  • [2] Principal component analysis for clustering gene expression data
    Yeung, KY
    Ruzzo, WL
    BIOINFORMATICS, 2001, 17 (09) : 763 - 774
  • [3] Integrative sparse principal component analysis of gene expression data
    Liu, Mengque
    Fan, Xinyan
    Fang, Kuangnan
    Zhang, Qingzhao
    Ma, Shuangge
    GENETIC EPIDEMIOLOGY, 2017, 41 (08) : 844 - 865
  • [4] Gene expression data classification with kernel principal component analysis
    Liu, ZQ
    Chen, DC
    Bensmail, H
    JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2005, (02): : 155 - 159
  • [5] Chemical processes monitoring based on weighted principal component analysis and its application
    Jiang, Qingchao
    Yan, Xuefeng
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2012, 119 : 11 - 20
  • [6] Block principal component analysis with application to gene microarray data classification
    Liu, AY
    Zhang, Y
    Gehan, E
    Clarke, R
    STATISTICS IN MEDICINE, 2002, 21 (22) : 3465 - 3474
  • [7] Enhancements to a Geographically Weighted Principal Component Analysis in the Context of an Application to an Environmental Data Set
    Harris, Paul
    Clarke, Annemarie
    Juggins, Steve
    Brunsdon, Chris
    Charlton, Martin
    GEOGRAPHICAL ANALYSIS, 2015, 47 (02) : 146 - 172
  • [8] On weighted principal component analysis for interval-valued data and its dynamic feature
    Sato-Ilic, Mika
    Oshima, Junya
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2006, 2 (01): : 69 - 82
  • [9] Weighted Principal Component Analysis
    Fan, Zizhu
    Liu, Ergen
    Xu, Baogen
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT III, 2011, 7004 : 569 - 574
  • [10] Heuristic principal component analysis-based unsupervised feature extraction and its application to gene expression analysis of amyotrophic lateral sclerosis data sets
    Taguchi, Y-h.
    Iwadate, Mitsuo
    Umeyama, Hideaki
    2015 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB), 2015, : 8 - 17