Tweedie Distributions for Biological Sequences Alignments

被引:0
|
作者
Hanen, Ben Hassen [1 ]
Khalil, Masmoudi [1 ]
Afif, Masmoudi [1 ]
机构
[1] Univ Sfax, Fac Sci Sfax, Lab Probabil & Stat, PB 1171, Sfax 3000, Tunisia
关键词
Gapped aligned sequences; Pairwise alignment; Scores distribution; Dispersion models; Parameters estimation; STATISTICAL SIGNIFICANCE; LOCAL ALIGNMENT; P-VALUES;
D O I
10.1007/s12561-023-09388-4
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
An important technique in the study of the similarity between biological sequences is the analysis of their alignments score distribution. The estimation of such distribution plays a central role in the evaluation of the statistical significance of these alignments. In the amino acid sequences alignment, the scores of the ungapped aligned segments are proven to be asymptotically distributed according to the extreme value law. Their gapped alignments scores are generally fitted with poisson or Gumbel distributions. In order to widen the scope of the candidate distributions, other classes of statistical models can be used. In this paper, we proposed to use the class of exponential dispersion models which includes several common laws such as Gaussian, Poisson and Gamma distributions on top of many others. In this context, a new algorithm for this model parameters estimation was introduced. This proposed approach is based on the selection of the appropriate distribution and maximum likelihood estimation. An asymptotic confidence interval was provided to estimate the dispersion parameter. Ultimately, the suggested algorithm performance was evaluated through different numerical experiments based on random sequences using different generation techniques.
引用
收藏
页码:165 / 184
页数:20
相关论文
共 50 条
  • [1] Tweedie Distributions for Biological Sequences Alignments
    Ben Hassen Hanen
    Masmoudi Khalil
    Masmoudi Afif
    Statistics in Biosciences, 2024, 16 : 165 - 184
  • [2] OPTIMAL ALIGNMENTS OF BIOLOGICAL SEQUENCES ON A MICROCOMPUTER
    WATANABE, K
    URANO, Y
    TAMAOKI, T
    COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1985, 1 (02): : 83 - 87
  • [3] Enumerating suboptimal alignments of multiple biological sequences efficiently
    Shibuya, T
    Imai, H
    PACIFIC SYMPOSIUM ON BIOCOMPUTING '97, 1996, : 409 - 420
  • [4] Domains of attraction to Tweedie distributions
    B. Jørgensen
    J. R. Martínez
    V. Vinogradov
    Lithuanian Mathematical Journal, 2009, 49 : 399 - 425
  • [5] Domains of attraction to Tweedie distributions
    Jorgensen, B.
    Martinez, J. R.
    Vinogradov, V.
    LITHUANIAN MATHEMATICAL JOURNAL, 2009, 49 (04) : 399 - 425
  • [6] On Suboptimal LCS-alignments for Independent Bernoulli Sequences with Asymmetric Distributions
    Stanislaw Barder
    Jüri Lember
    Heinrich Matzinger
    Märt Toots
    Methodology and Computing in Applied Probability, 2012, 14 : 357 - 382
  • [7] On Suboptimal LCS-alignments for Independent Bernoulli Sequences with Asymmetric Distributions
    Barder, Stanislaw
    Lember, Jueri
    Matzinger, Heinrich
    Toots, Maert
    METHODOLOGY AND COMPUTING IN APPLIED PROBABILITY, 2012, 14 (02) : 357 - 382
  • [8] Retrieving Myers-Miller Alignments for Pairwise Biological Sequences Using Spark
    Zhu, Xiangyuan
    Li, Bing
    Li, Jian
    Li, Kenli
    2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017, : 2049 - 2053
  • [9] Using Tweedie distributions for fitting spike count data
    Moshitch, Dina
    Nelken, Israel
    JOURNAL OF NEUROSCIENCE METHODS, 2014, 225 : 13 - 28
  • [10] Creation of a Database Including a Set of Biological Features Relaed to Protein Sequences and their Corresponding Alignments
    Ortuno, F.
    Pomares, H.
    Rojas, I.
    Valenzuela, O.
    2014 IEEE 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2014, : 557 - +