BAYESIAN MULTISTUDY FACTOR ANALYSIS FOR HIGH-THROUGHPUT BIOLOGICAL DATA

被引:10
|
作者
De Vito, Roberta [1 ]
Bellio, Ruggero [2 ]
Trippa, Lorenzo [3 ]
Parmigiani, Giovanni [4 ]
机构
[1] Brown Univ, Dept Biostat, Providence, RI 02912 USA
[2] Univ Udine, Dept Econ & Stat, Udine, Italy
[3] Dana Farber Canc Inst, Dept Data Sci, Boston, MA 02115 USA
[4] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA USA
来源
ANNALS OF APPLIED STATISTICS | 2021年 / 15卷 / 04期
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Dimension reduction; factor analysis; gene expression; Gibbs sampling; meta-analysis; PRINCIPAL COMPONENT ANALYSIS; CANCER; EXPRESSION; SUBTYPES; ROTATION; MODEL; PHOSPHORYLATION; INTEGRATION; PATTERNS; RISK;
D O I
10.1214/21-AOAS1456
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper analyzes breast cancer gene expression across seven studies to identify genuine and thus replicable gene patterns shared among these studies. Our premise is that genuine biological signal is more likely to be reproducibly present in multiple studies than spurious signal. Our analysis uses a new modeling strategy for the joint analysis of high-throughput biological studies which simultaneously identifies shared as well as study-specific signal. To this end, we generalize the multi-study factor analysis model to handle high-dimensional data and generalize the sparse Bayesian infinite factor model to this context. We provide strategies for the identification of the loading matrices, common and study-specific. Through extensive simulation analysis, we characterize the performance of the proposed approach in various scenarios and show that it outperforms standard factor analysis in identifying replicable signal in all scenarios considered. The analysis of breast cancer gene expression studies identifies clear replicable gene patterns. These patterns are related to well-known biological pathways involved in breast cancer, such as the ER, cell cycle, immune system, collagen, and metabolic pathways. Some of these patterns are also associated with existing breast cancer subtypes, such as LumA, Her2, and basal subtypes, while other patterns identify novel pathways active across subtypes and missed by hierarchical clustering approaches. The R package MSFA implementing the method is available on GitHub.
引用
收藏
页码:1723 / 1741
页数:19
相关论文
共 50 条
  • [31] High-throughput data analysis and data integration for vaccine trials
    Weiner, January, III
    Kaufmann, Stefan H. E.
    Maertzdorf, Jeroen
    [J]. VACCINE, 2015, 33 (40) : 5249 - 5255
  • [32] High-throughput quantitative analysis of pharmaceutical compounds in biological matrices
    Heudi, Olivier
    [J]. BIOANALYSIS, 2011, 3 (08) : 819 - 821
  • [33] High-throughput mass spectrometric analysis of xenobiotics in biological fluids
    Bakhtiar, R
    Ramos, L
    Tse, FLS
    [J]. JOURNAL OF LIQUID CHROMATOGRAPHY & RELATED TECHNOLOGIES, 2002, 25 (04) : 507 - 540
  • [34] Correction to: Bayesian functional regression as an alternative statistical analysis of high-throughput phenotyping data of modern agriculture
    Abelardo Montesinos-López
    Osval A. Montesinos-López
    Gustavo de los Campos
    José Crossa
    Juan Burgueño
    Francisco Javier Luna-Vazquez
    [J]. Plant Methods, 14
  • [35] The Application of Cheminformatics in the Analysis of High-Throughput Screening Data
    Walters, W. Patrick
    Aronov, Alexander
    Goldman, Brian
    McClain, Brian
    Perola, Emanuele
    Weiss, Jonathan
    [J]. FRONTIERS IN MOLECULAR DESIGN AND CHEMIAL INFORMATION SCIENCE - HERMAN SKOLNIK AWARD SYMPOSIUM 2015: JURGEN BAJORATH, 2016, 1222 : 269 - 282
  • [36] Need for speed in high-throughput sequencing data analysis
    Pluss, M.
    Caspar, S. M.
    Meienberg, J.
    Kopps, A. M.
    Keller, I.
    Bruggmann, R.
    Vogel, M.
    Matyas, G.
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2018, 26 : 721 - 722
  • [37] High-throughput metaproteomics data analysis with Unipept: A tutorial
    Mesuere, Bart
    Van der Jeugt, Felix
    Willems, Toon
    Naessens, Tom
    Devreese, Bart
    Martens, Lennart
    Dawyndt, Peter
    [J]. JOURNAL OF PROTEOMICS, 2018, 171 : 11 - 22
  • [38] Computational Methods for Analysis of High-Throughput Screening Data
    Balakin, Konstantin V.
    Savchuk, Nikolay P.
    [J]. CURRENT COMPUTER-AIDED DRUG DESIGN, 2006, 2 (01) : 1 - 19
  • [39] Sparse Canonical Covariance Analysis for High-throughput Data
    Lee, Woojoo
    Lee, Donghwan
    Lee, Youngjo
    Pawitan, Yudi
    [J]. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2011, 10 (01)
  • [40] Feature cluster selection for high-throughput data analysis
    Yu, Lei
    Li, Hao
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS, 2007, : 9 - 14