A Weighted Principal Component Analysis and Its Application to Gene Expression Data

被引:28
|
作者
da Costa, Joaquim F. Pinto [1 ,2 ]
Alonso, Hugo [3 ,4 ,5 ]
Roque, Luis [6 ]
机构
[1] Univ Porto, Fac Ciencias, Dept Matemat, P-4169007 Oporto, Portugal
[2] Univ Porto CMUP, Ctr Matemat, Oporto, Portugal
[3] Univ Lusofona Porto, Fac Econ & Gestao, P-4000098 Oporto, Portugal
[4] Univ Aveiro, Dept Matemat, P-3810193 Aveiro, Portugal
[5] Univ Aveiro, CIDMA, Aveiro, Portugal
[6] Inst Super Engn Porto, Grp Invest Engn Conhecimento & Apoio Decisao GECA, P-4200072 Oporto, Portugal
关键词
Correlation; principal component analysis; support vector machines; microarray data; gene selection; LYMPH-NODE METASTASIS; RANK MEASURE; CANCER; CLASSIFICATION; MICROARRAYS; CARCINOMAS; PROGNOSIS; CENTROIDS; SURVIVAL; MODELS;
D O I
10.1109/TCBB.2009.61
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In this work, we introduce in the first part new developments in Principal Component Analysis (PCA) and in the second part a new method to select variables (genes in our application). Our focus is on problems where the values taken by each variable do not all have the same importance and where the data may be contaminated with noise and contain outliers, as is the case with microarray data. The usual PCA is not appropriate to deal with this kind of problems. In this context, we propose the use of a new correlation coefficient as an alternative to Pearson's. This leads to a so-called weighted PCA (WPCA). In order to illustrate the features of our WPCA and compare it with the usual PCA, we consider the problem of analyzing gene expression data sets. In the second part of this work, we propose a new PCA-based algorithm to iteratively select the most important genes in a microarray data set. We show that this algorithm produces better results when our WPCA is used instead of the usual PCA. Furthermore, by using Support Vector Machines, we show that it can compete with the Significance Analysis of Microarrays algorithm.
引用
下载
收藏
页码:246 / 252
页数:7
相关论文
共 50 条
  • [21] Principal Component Analysis of Complex Data and Application to Climatology
    Camiz, Sergio
    Creta, Silvia
    CLASSIFICATION, (BIG) DATA ANALYSIS AND STATISTICAL LEARNING, 2018, : 77 - 85
  • [22] Tropical Principal Component Analysis and Its Application to Phylogenetics
    Ruriko Yoshida
    Leon Zhang
    Xu Zhang
    Bulletin of Mathematical Biology, 2019, 81 : 568 - 597
  • [23] Tropical Principal Component Analysis and Its Application to Phylogenetics
    Yoshida, Ruriko
    Zhang, Leon
    Zhang, Xu
    BULLETIN OF MATHEMATICAL BIOLOGY, 2019, 81 (02) : 568 - 597
  • [24] The improve model of the principal component analysis and its application
    Lu, ZH
    Li, JG
    PROCEEDINGS OF 2002 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING, VOLS I AND II, 2002, : 417 - 421
  • [25] Weighted principal component analysis and its applications to improve FDC performance
    Yue, HH
    Tomoyasu, M
    2004 43RD IEEE CONFERENCE ON DECISION AND CONTROL (CDC), VOLS 1-5, 2004, : 4262 - 4267
  • [26] Component retention in principal component analysis with application to cDNA microarray data
    Cangelosi, Richard
    Goriely, Alain
    BIOLOGY DIRECT, 2007, 2 (1)
  • [27] Component retention in principal component analysis with application to cDNA microarray data
    Richard Cangelosi
    Alain Goriely
    Biology Direct, 2
  • [28] A novel dimension reduction algorithm based on weighted kernel principal analysis for gene expression data
    Liu, Wen Bo
    Liang, Sheng Nan
    Qin, Xi Wen
    PLOS ONE, 2021, 16 (10): : e0258326
  • [29] A review of independent component analysis application to microarray gene expression data
    Kong, Wei
    Vanderburg, Charles R.
    Gunshin, Hiromi
    Rogers, Jack T.
    Huang, Xudong
    BIOTECHNIQUES, 2008, 45 (05) : 501 - +
  • [30] Feature Selection in Gene Expression Data Using Principal Component Analysis and Rough Set Theory
    Mishra, Debahuti
    Dash, Rajashree
    Rath, Amiya Kumar
    Acharya, Milu
    SOFTWARE TOOLS AND ALGORITHMS FOR BIOLOGICAL SYSTEMS, 2011, 696 : 91 - 100