A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data

被引:0
|
作者
Lai, En-Yu [1 ,2 ]
Chen, Yi-Hau [3 ]
Wul, Kun-Pin [1 ]
机构
[1] Natl Yang Ming Univ, Inst Biomed Informat, Taipei 11221, Taiwan
[2] Acad Sinica, Inst Informat Sci, Taiwan Int Grad Program, Bioinformat Program, Taipei 11529, Taiwan
[3] Acad Sinica, Inst Stat Sci, Taipei 11529, Taiwan
关键词
GENE-SET ANALYSIS; JAK/STAT PATHWAYS; EXPRESSION DATA; MICROARRAY; TESTS; PI3K/PTEN/AKT/MTOR; RAF/MEK/ERK; REVEALS; CELLS; TERMS;
D O I
10.1371/journal.pcbi.1005601
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T-2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T-2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T-2-statistic into an R package T2GA, which is available at https://github.com/ roqe/T2GA.
引用
收藏
页数:29
相关论文
共 50 条
  • [21] Relationship Between a Central Limit Theorem and Hotelling's T2-Statistic in the Context of the Stochastic EM Algorithm Used in Mixture Analysis
    Polymenis, Athanase
    COMMUNICATIONS IN MATHEMATICS AND APPLICATIONS, 2021, 12 (03): : 749 - 754
  • [22] Quantitative analysis for resilience-based urban rail systems: A hybrid knowledge-based and data-driven approach
    Yin, Jiateng
    Ren, Xianliang
    Liu, Ronghui
    Tang, Tao
    Su, Shuai
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2022, 219
  • [23] Supporting Meteorologists in Data Analysis through Knowledge-Based Recommendations
    Reis, Thoralf
    Funke, Tim
    Bruchhaus, Sebastian
    Freund, Florian
    Bornschlegl, Marco X.
    Hemmje, Matthias L.
    BIG DATA AND COGNITIVE COMPUTING, 2022, 6 (04)
  • [24] Knowledge-based Simulation Experiment Data Integrative Analysis Technology
    Jiao Song
    Li Wei
    Ma Ping
    Yang Ming
    2012 ACM/IEEE/SCS 26TH WORKSHOP ON PRINCIPLES OF ADVANCED AND DISTRIBUTED SIMULATION (PADS), 2012, : 165 - 167
  • [25] A KNOWLEDGE-BASED APPROACH FOR SUPPORTING AQUACULTURE DATA ANALYSIS PROFICIENCY
    Oliveira, Pedro
    Costa, Ruben
    Lima, Jose
    Ferreira, Fernando
    Sarraipa, Joao
    PROCEEDINGS OF THE ASME INTERNATIONAL MECHANICAL ENGINEERING CONGRESS AND EXPOSITION, 2015, VOL 2B, 2016,
  • [26] Distributed knowledge-based medical system for data assimilation and analysis
    Tomenko, V.
    Popov, V.
    DATA MINING VIII: DATA, TEXT AND WEB MINING AND THEIR BUSINESS APPLICATIONS, 2007, 38 : 273 - +
  • [27] A Generic Knowledge-based Approach to the Analysis of Partial Discharge Data
    Rudd, S.
    McArthur, S. D. J.
    Judd, M. D.
    IEEE TRANSACTIONS ON DIELECTRICS AND ELECTRICAL INSULATION, 2010, 17 (01) : 149 - 156
  • [28] Knowledge-based Data Processing for Multilingual Natural Language Analysis
    Jain, Deepak Kumar
    Eyre, Yamila Garcia-Martinez
    Kumar, Akshi
    Gupta, Brij B.
    Kotecha, Ketan
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (05)
  • [29] ANALYSIS OF GIS SPATIAL DATA USING KNOWLEDGE-BASED METHODS
    SRINIVASAN, A
    RICHARDS, JA
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SYSTEMS, 1993, 7 (06): : 479 - 500
  • [30] The Baquara2 knowledge-based framework for semantic enrichment and analysis of movement data
    Fileto, Renato
    May, Cleto
    Renso, Chiara
    Pelekis, Nikos
    Klein, Douglas
    Theodoridis, Yannis
    DATA & KNOWLEDGE ENGINEERING, 2015, 98 : 104 - 122