BAYESIAN MULTISTUDY FACTOR ANALYSIS FOR HIGH-THROUGHPUT BIOLOGICAL DATA

被引:10
|
作者
De Vito, Roberta [1 ]
Bellio, Ruggero [2 ]
Trippa, Lorenzo [3 ]
Parmigiani, Giovanni [4 ]
机构
[1] Brown Univ, Dept Biostat, Providence, RI 02912 USA
[2] Univ Udine, Dept Econ & Stat, Udine, Italy
[3] Dana Farber Canc Inst, Dept Data Sci, Boston, MA 02115 USA
[4] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA USA
来源
ANNALS OF APPLIED STATISTICS | 2021年 / 15卷 / 04期
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Dimension reduction; factor analysis; gene expression; Gibbs sampling; meta-analysis; PRINCIPAL COMPONENT ANALYSIS; CANCER; EXPRESSION; SUBTYPES; ROTATION; MODEL; PHOSPHORYLATION; INTEGRATION; PATTERNS; RISK;
D O I
10.1214/21-AOAS1456
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper analyzes breast cancer gene expression across seven studies to identify genuine and thus replicable gene patterns shared among these studies. Our premise is that genuine biological signal is more likely to be reproducibly present in multiple studies than spurious signal. Our analysis uses a new modeling strategy for the joint analysis of high-throughput biological studies which simultaneously identifies shared as well as study-specific signal. To this end, we generalize the multi-study factor analysis model to handle high-dimensional data and generalize the sparse Bayesian infinite factor model to this context. We provide strategies for the identification of the loading matrices, common and study-specific. Through extensive simulation analysis, we characterize the performance of the proposed approach in various scenarios and show that it outperforms standard factor analysis in identifying replicable signal in all scenarios considered. The analysis of breast cancer gene expression studies identifies clear replicable gene patterns. These patterns are related to well-known biological pathways involved in breast cancer, such as the ER, cell cycle, immune system, collagen, and metabolic pathways. Some of these patterns are also associated with existing breast cancer subtypes, such as LumA, Her2, and basal subtypes, while other patterns identify novel pathways active across subtypes and missed by hierarchical clustering approaches. The R package MSFA implementing the method is available on GitHub.
引用
收藏
页码:1723 / 1741
页数:19
相关论文
共 50 条
  • [41] Statistical practice in high-throughput screening data analysis
    Malo, N
    Hanley, JA
    Cerquozzi, S
    Pelletier, J
    Nadon, R
    [J]. NATURE BIOTECHNOLOGY, 2006, 24 (02) : 167 - 175
  • [42] Feature cluster selection for high-throughput data analysis
    Yu, Lei
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2009, 3 (02) : 177 - 191
  • [43] Statistical practice in high-throughput screening data analysis
    Nathalie Malo
    James A Hanley
    Sonia Cerquozzi
    Jerry Pelletier
    Robert Nadon
    [J]. Nature Biotechnology, 2006, 24 : 167 - 175
  • [44] Shape analysis of high-throughput transcriptomics experiment data
    Okrah, Kwame
    Bravo, Hector Corrada
    [J]. BIOSTATISTICS, 2015, 16 (04) : 627 - 640
  • [45] Computational analysis of high-throughput flow cytometry data
    Robinson, J. Paul
    Rajwa, Bartek
    Patsekin, Valery
    Davisson, Vincent Jo
    [J]. EXPERT OPINION ON DRUG DISCOVERY, 2012, 7 (08) : 679 - 693
  • [46] High-throughput single cell data analysis - A tutorial
    Tinnevelt, Gerjen H.
    Wouters, Kristiaan
    Postma, Geert J.
    Folcarelli, Rita
    Jansen, Jeroen J.
    [J]. ANALYTICA CHIMICA ACTA, 2021, 1185
  • [47] A Bayesian Framework to Identify Methylcytosines from High-Throughput Bisulfite Sequencing Data
    Xie, Qing
    Liu, Qi
    Mao, Fengbiao
    Cai, Wanshi
    Wu, Honghu
    You, Mingcong
    Wang, Zhen
    Chen, Bingyu
    Sun, Zhong Sheng
    Wu, Jinyu
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2014, 10 (09)
  • [48] Statistical methods for the analysis of high-throughput metabolomics data
    Bartel, Joerg
    Krumsiek, Jan
    Theis, Fabian J.
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2013, 4 (05):
  • [49] Enabling high-throughput experimentation through high-throughput analysis
    Schafer, Wes
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2016, 252
  • [50] Integrated analysis of ecotoxicological related high-throughput data
    Boatti, L.
    Boria, I.
    Marsano, F.
    Dondero, F.
    Viarengo, A.
    Mignone, F.
    [J]. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY A-MOLECULAR & INTEGRATIVE PHYSIOLOGY, 2012, 163 (01): : S44 - S44