A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data

被引:0
|
作者
Lai, En-Yu [1 ,2 ]
Chen, Yi-Hau [3 ]
Wul, Kun-Pin [1 ]
机构
[1] Natl Yang Ming Univ, Inst Biomed Informat, Taipei 11221, Taiwan
[2] Acad Sinica, Inst Informat Sci, Taiwan Int Grad Program, Bioinformat Program, Taipei 11529, Taiwan
[3] Acad Sinica, Inst Stat Sci, Taipei 11529, Taiwan
关键词
GENE-SET ANALYSIS; JAK/STAT PATHWAYS; EXPRESSION DATA; MICROARRAY; TESTS; PI3K/PTEN/AKT/MTOR; RAF/MEK/ERK; REVEALS; CELLS; TERMS;
D O I
10.1371/journal.pcbi.1005601
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T-2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T-2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T-2-statistic into an R package T2GA, which is available at https://github.com/ roqe/T2GA.
引用
收藏
页数:29
相关论文
共 50 条
  • [31] Knowledge-based method for segmentation and quantitative analysis of lung function from CT
    Brown, MS
    McNitt-Gray, MF
    Goldin, JG
    Greaser, LE
    Aberle, DR
    COMPUTER-AIDED DIAGNOSIS IN MEDICAL IMAGING, 1999, 1182 : 113 - 118
  • [32] A multiple approach to data analysis and uncertainty management in knowledge-based systems
    Schuster, A
    Shapcott, M
    Adamson, K
    Bell, DA
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2000, 15 (02) : 93 - 116
  • [33] Spectral knowledge-based regression for laser-induced breakdown spectroscopy quantitative analysis
    Song, Weiran
    Afgan, Muhammad Sher
    Yun, Yong-Huan
    Wang, Hui
    Cui, Jiacheng
    Gu, Weilun
    Hou, Zongyu
    Wang, Zhe
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 205
  • [34] Spectral knowledge-based regression for laser-induced breakdown spectroscopy quantitative analysis
    Song, Weiran
    Afgan, Muhammad Sher
    Yun, Yong-Huan
    Wang, Hui
    Cui, Jiacheng
    Gu, Weilun
    Hou, Zongyu
    Wang, Zhe
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 205
  • [35] Automated system for knowledge-based continuous organic synthesis: Data-driven pathway design and validation
    Coley, Connor
    Plehiers, Pieter
    Jin, Wengong
    Gao, Hanyu
    Wang, Yuran
    Schreck, John
    Bishop, Kyle
    Barzilay, Regina
    Jaakkola, Tommi
    Green, William
    Jensen, Klavs
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 256
  • [36] Development of a knowledge-based multi-scheme cancer microarray data analysis system
    Phan, JH
    Quo, CF
    Guo, KJ
    Feng, WM
    Wang, G
    Wang, MD
    2004 IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE, PROCEEDINGS, 2004, : 474 - 475
  • [37] Knowledge-based analysis of microarray gene expression data by using support vector machines
    Brown, MPS
    Grundy, WN
    Lin, D
    Cristianini, N
    Sugnet, CW
    Furey, TS
    Ares, M
    Haussler, D
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) : 262 - 267
  • [38] PopHR: a knowledge-based platform to support integration, analysis, and visualization of population health data
    Shaban-Nejad, Arash
    Lavigne, Maxime
    Okhmatovskaia, Anya
    Buckeridge, David L.
    ANNALS OF THE NEW YORK ACADEMY OF SCIENCES, 2017, 1387 (01) : 44 - 53
  • [39] Kronos: Lightweight Knowledge-based Event Analysis in Cyber-Physical Data Streams
    Namaki, Mohammad Hossein
    Zhang, Xin
    Singh, Sukhjinder
    Ahmed, Arman
    Foroutan, Armina
    Wu, Yinghui
    Srivastava, Anurag K.
    Kocheturov, Anton
    2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 1766 - 1769
  • [40] Quantitative Proteomic Analysis of Hepatic Tissue of T2DM Rhesus Macaque
    Du, Tingfu
    Lu, Shuaiyao
    Jiang, Qinfang
    Li, Yun
    Ma, Kaili
    JOURNAL OF DIABETES RESEARCH, 2017, 2017