Tweedie Distributions for Biological Sequences Alignments

被引:0
|
作者
Hanen, Ben Hassen [1 ]
Khalil, Masmoudi [1 ]
Afif, Masmoudi [1 ]
机构
[1] Univ Sfax, Fac Sci Sfax, Lab Probabil & Stat, PB 1171, Sfax 3000, Tunisia
关键词
Gapped aligned sequences; Pairwise alignment; Scores distribution; Dispersion models; Parameters estimation; STATISTICAL SIGNIFICANCE; LOCAL ALIGNMENT; P-VALUES;
D O I
10.1007/s12561-023-09388-4
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
An important technique in the study of the similarity between biological sequences is the analysis of their alignments score distribution. The estimation of such distribution plays a central role in the evaluation of the statistical significance of these alignments. In the amino acid sequences alignment, the scores of the ungapped aligned segments are proven to be asymptotically distributed according to the extreme value law. Their gapped alignments scores are generally fitted with poisson or Gumbel distributions. In order to widen the scope of the candidate distributions, other classes of statistical models can be used. In this paper, we proposed to use the class of exponential dispersion models which includes several common laws such as Gaussian, Poisson and Gamma distributions on top of many others. In this context, a new algorithm for this model parameters estimation was introduced. This proposed approach is based on the selection of the appropriate distribution and maximum likelihood estimation. An asymptotic confidence interval was provided to estimate the dispersion parameter. Ultimately, the suggested algorithm performance was evaluated through different numerical experiments based on random sequences using different generation techniques.
引用
收藏
页码:165 / 184
页数:20
相关论文
共 50 条
  • [41] Seasonal rainfall totals of Australian stations can be modelled with distributions from the Tweedie family
    Hasan, Md Masud
    Dunn, Peter K.
    INTERNATIONAL JOURNAL OF CLIMATOLOGY, 2015, 35 (10) : 3093 - 3101
  • [42] Sequences of elliptical distributions and mixtures of normal distributions
    Gómez-Sánchez-Manzano, E
    Gómez-Villegas, MA
    Marín, JM
    JOURNAL OF MULTIVARIATE ANALYSIS, 2006, 97 (02) : 295 - 310
  • [43] VALUED-BASED DISTRIBUTIONS AND ALIGNMENTS IN FORTRAN-D
    VONHANXLEDEN, R
    KENNEDY, K
    SALTZ, J
    JOURNAL OF PROGRAMMING LANGUAGES, 1994, 2 (03): : 259 - 282
  • [44] The PSSH database of alignments between protein sequences and tertiary structures
    Schafferhans, A
    Meyer, JEW
    O'Donoghue, SI
    NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 494 - 498
  • [45] Bellerophon: a program to detect chimeric sequences in multiple sequence alignments
    Huber, T
    Faulkner, G
    Hugenholtz, P
    BIOINFORMATICS, 2004, 20 (14) : 2317 - 2319
  • [46] Clustal Omega for making accurate alignments of many protein sequences
    Sievers, Fabian
    Higgins, Desmond G.
    PROTEIN SCIENCE, 2018, 27 (01) : 135 - 145
  • [47] Uniclust databases of clustered and deeply annotated protein sequences and alignments
    Mirdita, Milot
    von den Driesch, Lars
    Galiez, Clovis
    Martin, Maria J.
    Soeding, Johannes
    Steinegger, Martin
    NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) : D170 - D176
  • [48] A SIMPLE METHOD TO GENERATE NONTRIVIAL ALTERNATE ALIGNMENTS OF PROTEIN SEQUENCES
    SAQI, MAS
    STERNBERG, MJE
    JOURNAL OF MOLECULAR BIOLOGY, 1991, 219 (04) : 727 - 732
  • [49] An exact formula for the number of alignments between two DNA sequences
    Torres, A
    Cabada, A
    Nieto, JJ
    DNA SEQUENCE, 2003, 14 (06): : 427 - 430
  • [50] Locating alignments with k differences for nucleotide and amino acid sequences
    Landau, G.M.
    Vishkin, U.
    Nussinov, R.
    Computer Applications in the Biosciences, 1988, 4 (01): : 19 - 24