A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data

被引：0

作者：

Lai, En-Yu ^{[1
,2
]}

Chen, Yi-Hau ^{[3
]}

Wul, Kun-Pin ^{[1
]}

机构：

[1] Natl Yang Ming Univ, Inst Biomed Informat, Taipei 11221, Taiwan

[2] Acad Sinica, Inst Informat Sci, Taiwan Int Grad Program, Bioinformat Program, Taipei 11529, Taiwan

[3] Acad Sinica, Inst Stat Sci, Taipei 11529, Taiwan

来源：

PLOS COMPUTATIONAL BIOLOGY | 2017年 / 13卷 / 06期

关键词：

GENE-SET ANALYSIS; JAK/STAT PATHWAYS; EXPRESSION DATA; MICROARRAY; TESTS; PI3K/PTEN/AKT/MTOR; RAF/MEK/ERK; REVEALS; CELLS; TERMS;

D O I：

10.1371/journal.pcbi.1005601

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T-2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T-2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T-2-statistic into an R package T2GA, which is available at https://github.com/ roqe/T2GA.

引用

页数：29

共 50 条

[1] PCA Based Measures: Q-Statistic and T2-Statistic for Assessing Damages in Structures
Mujica, L. E.
Rodellar, J.
Guemes, A.
Lopez-Diez, J.
PROCEEDINGS OF THE FOURTH EUROPEAN WORKSHOP ON STRUCTURAL HEALTH MONITORING 2008, 2008, : 1088 - 1095
[2] Q-statistic and T2-statistic PCA-based measures for damage assessment in structures
Mujica, L. E.
Rodellar, J.
Fernandez, A.
Gueemes, A.
STRUCTURAL HEALTH MONITORING-AN INTERNATIONAL JOURNAL, 2011, 10 (05): : 539 - 553
[3] Knowledge-based analysis of proteomics data
Marina Bessarabova
Alexander Ishkin
Lellean JeBailey
Tatiana Nikolskaya
Yuri Nikolsky
BMC Bioinformatics, 13
[4] Knowledge-based data analysis and interpretation
Zupan, Blaz
Holmes, John H.
Bellazzi, Riccardo
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2006, 37 (03) : 163 - 165
[5] Knowledge-based analysis of proteomics data
Bessarabova, Marina
Ishkin, Alexander
JeBailey, Lellean
Nikolskaya, Tatiana
Nikolsky, Yuri
BMC BIOINFORMATICS, 2012, 13
[6] Knowledge-based variable selection for learning rules from proteomic data
Jonathan L Lustgarten
Shyam Visweswaran
Robert P Bowser
William R Hogan
Vanathi Gopalakrishnan
BMC Bioinformatics, 10
[7] Knowledge-based variable selection for learning rules from proteomic data
Lustgarten, Jonathan L.
Visweswaran, Shyam
Bowser, Robert P.
Hogan, William R.
Gopalakrishnan, Vanathi
BMC BIOINFORMATICS, 2009, 10
[8] Finite-sample inference with monotone incomplete multivariate normal data, III: Hotelling's T2-statistic
Romer, Megan M.
Richards, Donald St. P.
STATISTICAL MODELLING, 2013, 13 (5-6) : 431 - 457
[9] Quantitative knowledge-based analysis in compound safety assessment
Bureeva, Svetlana
Nikolsky, Yuri
EXPERT OPINION ON DRUG METABOLISM & TOXICOLOGY, 2011, 7 (03) : 287 - 298
[10] Knowledge-Based Analysis of Web Data Extraction
Abdullah, Marwah N.
Hassan, Alaa
Naef, Nadia
FIFTH INTERNATIONAL CONFERENCE ON INFORMATICS AND APPLICATIONS (ICIA2016), 2016, : 26 - 32

← 1 2 3 4 5 →