Dirichlet-multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data

被引:26
|
作者
Harrison, Joshua G. [1 ]
Calder, W. John [1 ]
Shastry, Vivaswat [1 ]
Buerkle, C. Alex [1 ]
机构
[1] Univ Wyoming, Dept Bot, 3165,1000 E Univ Ave, Laramie, WY 82071 USA
基金
美国国家科学基金会;
关键词
Bayesian statistics; compositional data analysis; Dirichlet; Hamiltonian Monte Carlo; hierarchical modelling; JAGS; Markov chain Monte Carlo; microbial ecology; microbiome; multinomial; stan; transcriptome; variational inference; COMPOSITIONAL DATA-ANALYSIS; VARIATIONAL INFERENCE; VARIABLE SELECTION; ASPIRATION; PNEUMONIA; REGRESSION; STANDARDS; ABUNDANCE; RISK;
D O I
10.1111/1755-0998.13128
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Molecular ecology regularly requires the analysis of count data that reflect the relative abundance of features of a composition (e.g., taxa in a community, gene transcripts in a tissue). The sampling process that generates these data can be modelled using the multinomial distribution. Replicate multinomial samples inform the relative abundances of features in an underlying Dirichlet distribution. These distributions together form a hierarchical model for relative abundances among replicates and sampling groups. This type of Dirichlet-multinomial modelling (DMM) has been described previously, but its benefits and limitations are largely untested. With simulated data, we quantified the ability of DMM to detect differences in proportions between treatment and control groups, and compared the efficacy of three computational methods to implement DMM-Hamiltonian Monte Carlo (HMC), variational inference (VI), and Gibbs Markov chain Monte Carlo. We report that DMM was better able to detect shifts in relative abundances than analogous analytical tools, while identifying an acceptably low number of false positives. Among methods for implementing DMM, HMC provided the most accurate estimates of relative abundances, and VI was the most computationally efficient. The sensitivity of DMM was exemplified through analysis of previously published data describing lung microbiomes. We report that DMM identified several potentially pathogenic, bacterial taxa as more abundant in the lungs of children who aspirated foreign material during swallowing; these differences went undetected with different statistical approaches. Our results suggest that DMM has strong potential as a statistical method to guide inference in molecular ecology.
引用
收藏
页码:481 / 497
页数:17
相关论文
共 50 条
  • [1] VARIABLE SELECTION FOR SPARSE DIRICHLET-MULTINOMIAL REGRESSION WITH AN APPLICATION TO MICROBIOME DATA ANALYSIS
    Chen, Jun
    Li, Hongzhe
    [J]. ANNALS OF APPLIED STATISTICS, 2013, 7 (01): : 418 - 442
  • [2] Batch effects correction for microbiome data with Dirichlet-multinomial regression
    Dai, Zhenwei
    Wong, Sunny H.
    Yu, Jun
    Wei, Yingying
    [J]. BIOINFORMATICS, 2019, 35 (05) : 807 - 814
  • [3] Cluster analysis of microbiome data by using mixtures of Dirichlet-multinomial regression models
    Subedi, Sanjeena
    Neish, Drew
    Bak, Stephen
    Feng, Zeny
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2020, 69 (05) : 1163 - 1187
  • [4] General models for resource use or other compositional count data using the Dirichlet-multinomial distribution
    de Valpine, Perry
    Harmon-Threatt, Alexandra N.
    [J]. ECOLOGY, 2013, 94 (12) : 2678 - 2687
  • [5] An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data
    Wadsworth, W. Duncan
    Argiento, Raffaele
    Guindani, Michele
    Galloway-Pena, Jessica
    Shelburne, Samuel A.
    Vannucci, Marina
    [J]. BMC BIOINFORMATICS, 2017, 18
  • [6] An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data
    W. Duncan Wadsworth
    Raffaele Argiento
    Michele Guindani
    Jessica Galloway-Pena
    Samuel A. Shelburne
    Marina Vannucci
    [J]. BMC Bioinformatics, 18
  • [7] Clustering multivariate count data via Dirichlet-multinomial network fusion
    Zhao, Xin
    Zhang, Jingru
    Lin, Wei
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2023, 179
  • [8] Erratum to: An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data
    W. Duncan Wadsworth
    Raffaele Argiento
    Michele Guindani
    Jessica Galloway-Pena
    Samuel A. Shelburne
    Marina Vannucci
    [J]. BMC Bioinformatics, 18
  • [9] Interval estimation for the intraclass correlation in Dirichlet-multinomial data
    Lui, KJ
    Cumberland, WG
    Mayer, JA
    Eckhardt, L
    [J]. PSYCHOMETRIKA, 1999, 64 (03) : 355 - 369
  • [10] Interval estimation for the intraclass correlation in dirichlet-multinomial data
    Kung-Jong Lui
    William G. Cumberland
    Joni A. Mayer
    Laura Eckhardt
    [J]. Psychometrika, 1999, 64 : 355 - 369