BAYESIAN MULTISTUDY FACTOR ANALYSIS FOR HIGH-THROUGHPUT BIOLOGICAL DATA

被引:10
|
作者
De Vito, Roberta [1 ]
Bellio, Ruggero [2 ]
Trippa, Lorenzo [3 ]
Parmigiani, Giovanni [4 ]
机构
[1] Brown Univ, Dept Biostat, Providence, RI 02912 USA
[2] Univ Udine, Dept Econ & Stat, Udine, Italy
[3] Dana Farber Canc Inst, Dept Data Sci, Boston, MA 02115 USA
[4] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA USA
来源
ANNALS OF APPLIED STATISTICS | 2021年 / 15卷 / 04期
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Dimension reduction; factor analysis; gene expression; Gibbs sampling; meta-analysis; PRINCIPAL COMPONENT ANALYSIS; CANCER; EXPRESSION; SUBTYPES; ROTATION; MODEL; PHOSPHORYLATION; INTEGRATION; PATTERNS; RISK;
D O I
10.1214/21-AOAS1456
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper analyzes breast cancer gene expression across seven studies to identify genuine and thus replicable gene patterns shared among these studies. Our premise is that genuine biological signal is more likely to be reproducibly present in multiple studies than spurious signal. Our analysis uses a new modeling strategy for the joint analysis of high-throughput biological studies which simultaneously identifies shared as well as study-specific signal. To this end, we generalize the multi-study factor analysis model to handle high-dimensional data and generalize the sparse Bayesian infinite factor model to this context. We provide strategies for the identification of the loading matrices, common and study-specific. Through extensive simulation analysis, we characterize the performance of the proposed approach in various scenarios and show that it outperforms standard factor analysis in identifying replicable signal in all scenarios considered. The analysis of breast cancer gene expression studies identifies clear replicable gene patterns. These patterns are related to well-known biological pathways involved in breast cancer, such as the ER, cell cycle, immune system, collagen, and metabolic pathways. Some of these patterns are also associated with existing breast cancer subtypes, such as LumA, Her2, and basal subtypes, while other patterns identify novel pathways active across subtypes and missed by hierarchical clustering approaches. The R package MSFA implementing the method is available on GitHub.
引用
收藏
页码:1723 / 1741
页数:19
相关论文
共 50 条
  • [1] Pathway analysis of high-throughput biological data within a Bayesian network framework
    Isci, Senol
    Ozturk, Cengizhan
    Jones, Jon
    Otu, Hasan H.
    [J]. BIOINFORMATICS, 2011, 27 (12) : 1667 - 1674
  • [2] Quantitative analysis of high-throughput biological data
    Juan, Hsueh-Fen
    Huang, Hsuan-Cheng
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2023, 13 (04)
  • [3] BAYESIAN COMBINATORIAL MULTISTUDY FACTOR ANALYSIS
    Grabski, Isabella N.
    De Vito, Roberta
    Trippa, Lorenzo
    Parmigiani, Giovanni
    [J]. ANNALS OF APPLIED STATISTICS, 2023, 17 (03): : 2212 - 2235
  • [4] A Bayesian method for biological pathway discovery from high-throughput experimental data
    Wang, W
    Cooper, GF
    [J]. 2004 IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE, PROCEEDINGS, 2004, : 645 - 646
  • [5] A Bayesian Approach to High-Throughput Biological Model Generation
    Shi, Xinghua
    Stevens, Rick
    [J]. BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, PROCEEDINGS, 2009, 5462 : 376 - 387
  • [6] Novel Bioinformatics Approaches for Analysis of High-Throughput Biological Data
    Weng, Julia Tzu-Ya
    Wu, Li-Ching
    Chang, Wen-Chi
    Chang, Tzu-Hao
    Akutsu, Tatsuya
    Lee, Tzong-Yi
    [J]. BIOMED RESEARCH INTERNATIONAL, 2014, 2014
  • [7] Analysis of high-throughput biological data using their rank values
    Dembele, Doulaye
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2019, 28 (08) : 2276 - 2291
  • [8] Multiscale and Bayesian approaches to data analysis in genomics high-throughput screening
    Yang, CH
    Bakshi, BR
    Rathman, JF
    Blower, PE
    [J]. CURRENT OPINION IN DRUG DISCOVERY & DEVELOPMENT, 2002, 5 (03) : 428 - 438
  • [9] High-throughput analysis of biological activities
    Lazo, J. S.
    [J]. MOLECULAR INTERVENTIONS, 2006, 6 (04) : 192 - 192
  • [10] KNOWLEDGE-DRIVEN ANALYSIS AND DATA INTEGRATION FOR HIGH-THROUGHPUT BIOLOGICAL DATA
    Ochs, M. F.
    Quackenbush, J.
    Davuluri, R.
    Ressom, H.
    [J]. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2009, 2009, : 353 - +