An efficient algorithm for accurate computation of the Dirichlet-multinomial log-likelihood function

被引:14
|
作者
Yu, Peng [1 ,2 ]
Shaw, Chad A. [3 ]
机构
[1] Texas A&M Univ, Dept Elect & Comp Engn, College Stn, TX 77843 USA
[2] Texas A&M Univ, TEES AgriLife Ctr Bioinformat & Genom Syst Engn C, College Stn, TX 77843 USA
[3] Baylor Coll Med, Dept Mol & Human Genet, Houston, TX 77030 USA
关键词
DIFFERENTIAL EXPRESSION ANALYSIS; GOODNESS-OF-FIT; OVERDISPERSION; PACKAGE; MODEL;
D O I
10.1093/bioinformatics/btu079
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The Dirichlet-multinomial (DMN) distribution is a fundamental model for multicategory count data with overdispersion. This distribution has many uses in bioinformatics including applications to metagenomics data, transctriptomics and alternative splicing. The DMN distribution reduces to the multinomial distribution when the overdispersion parameter Psi is 0. Unfortunately, numerical computation of the DMN log-likelihood function by conventional methods results in instability in the neighborhood of Psi = 0. An alternative formulation circumvents this instability, but it leads to long runtimes that make it impractical for large count data common in bioinformatics. We have developed a newmethod for computation of the DMN log-likelihood to solve the instability problem without incurring long runtimes. The new approach is composed of a novel formula and an algorithm to extend its applicability. Our numerical experiments show that this new method both improves the accuracy of log-likelihood evaluation and the runtime by several orders of magnitude, especially in high-count data situations that are common in deep sequencing data. Using real metagenomic data, our method achieves manyfold runtime improvement. Our method increases the feasibility of using the DMN distribution to model many high-throughput problems in bioinformatics. We have included in our work an R package giving access to this method and a vingette applying this approach to metagenomic data.
引用
收藏
页码:1547 / 1554
页数:8
相关论文
共 50 条
  • [1] On the fast computation of the Dirichlet-multinomial log-likelihood function
    Languasco, Alessandro
    Migliardi, Mauro
    [J]. COMPUTATIONAL STATISTICS, 2023, 38 (04) : 1995 - 2013
  • [2] On the fast computation of the Dirichlet-multinomial log-likelihood function
    Alessandro Languasco
    Mauro Migliardi
    [J]. Computational Statistics, 2023, 38 : 1995 - 2013
  • [3] Efficient Computation of Log-likelihood Function in Clustering Overdispersed Count Data Using Multinomial Beta-Liouville Distribution
    Daghyani, Masoud
    Zamzami, Nuha
    Bouguila, Nizar
    [J]. 2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 986 - 993
  • [4] THE LOG-LIKELIHOOD RATIO FOR SPARSE MULTINOMIAL MIXTURES
    ZELTERMAN, D
    [J]. STATISTICS & PROBABILITY LETTERS, 1986, 4 (02) : 95 - 99
  • [5] PSEUDO MAXIMUM-LIKELIHOOD ESTIMATION FOR THE DIRICHLET-MULTINOMIAL DISTRIBUTION
    CHUANG, C
    COX, C
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1985, 14 (10) : 2293 - 2311
  • [6] Approximate Bayesian computation with modified log-likelihood ratios
    Ventura L.
    Reid N.
    [J]. METRON, 2014, 72 (2) : 231 - 245
  • [7] A log-likelihood function-based algorithm for QAM signal classification
    Yang, YP
    Liu, CH
    Soong, TW
    [J]. SIGNAL PROCESSING, 1998, 70 (01) : 61 - 71
  • [8] An efficient log-likelihood ratio computation method for coded space time block codes
    Cho, Jungho
    [J]. IMECS 2007: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2007, : 1301 - 1306
  • [9] On tests for global maximum of the log-likelihood function
    Blatt, Doron
    Hero, Alfred O., III
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2007, 53 (07) : 2510 - 2525
  • [10] Accurate Log-Likelihood Ratio Calculation for Vector Perturbation Precoding
    Tan, Jiabin
    Xiao, Yue
    Wu, Chaowu
    Tang, Wanbin
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2021, 70 (06) : 6272 - 6276