Tweedie Distributions for Biological Sequences Alignments

被引:0
|
作者
Hanen, Ben Hassen [1 ]
Khalil, Masmoudi [1 ]
Afif, Masmoudi [1 ]
机构
[1] Univ Sfax, Fac Sci Sfax, Lab Probabil & Stat, PB 1171, Sfax 3000, Tunisia
关键词
Gapped aligned sequences; Pairwise alignment; Scores distribution; Dispersion models; Parameters estimation; STATISTICAL SIGNIFICANCE; LOCAL ALIGNMENT; P-VALUES;
D O I
10.1007/s12561-023-09388-4
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
An important technique in the study of the similarity between biological sequences is the analysis of their alignments score distribution. The estimation of such distribution plays a central role in the evaluation of the statistical significance of these alignments. In the amino acid sequences alignment, the scores of the ungapped aligned segments are proven to be asymptotically distributed according to the extreme value law. Their gapped alignments scores are generally fitted with poisson or Gumbel distributions. In order to widen the scope of the candidate distributions, other classes of statistical models can be used. In this paper, we proposed to use the class of exponential dispersion models which includes several common laws such as Gaussian, Poisson and Gamma distributions on top of many others. In this context, a new algorithm for this model parameters estimation was introduced. This proposed approach is based on the selection of the appropriate distribution and maximum likelihood estimation. An asymptotic confidence interval was provided to estimate the dispersion parameter. Ultimately, the suggested algorithm performance was evaluated through different numerical experiments based on random sequences using different generation techniques.
引用
收藏
页码:165 / 184
页数:20
相关论文
共 50 条
  • [31] The number of reduced alignments between two DNA sequences
    Andrade, Helena
    Area, Ivan
    Nieto, Juan J.
    Torres, Angela
    BMC BIOINFORMATICS, 2014, 15
  • [32] The number of reduced alignments between two DNA sequences
    Helena Andrade
    Iván Area
    Juan J Nieto
    Ángela Torres
    BMC Bioinformatics, 15
  • [33] FASMA:A Service to Format and Analyze Sequences in Multiple Alignments
    Susan Costantini
    Giovanni Colonna
    Angelo M.Facchiano
    Genomics Proteomics & Bioinformatics, 2007, (Z1) : 253 - 255
  • [34] Jalview: Visualization and Analysis of Molecular Sequences, Alignments, and Structures
    Andrew Waterhouse
    Jim Procter
    David A Martin
    Geoffrey J Barton
    BMC Bioinformatics, 6 (Suppl 3)
  • [35] ON THE STATISTICAL SIGNIFICANCE OF PAIRWISE GLOBAL ALIGNMENTS OF NUCLEOTIDE SEQUENCES
    Chaurasia, Rajashree
    Ghose, Udayan
    JP JOURNAL OF BIOSTATISTICS, 2023, 23 (01) : 51 - 76
  • [36] Correlating patterns in alignments of polymorphic sequences with experimental assays
    Chelvanayagam, G
    Easteal, S
    COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1997, 13 (01): : 13 - 22
  • [37] SOFTWARE TOOLS FOR ANALYZING PAIRWISE ALIGNMENTS OF LONG SEQUENCES
    SCHWARTZ, S
    MILLER, W
    YANG, CM
    HARDISON, RC
    NUCLEIC ACIDS RESEARCH, 1991, 19 (17) : 4663 - 4667
  • [38] Hidden Markov models and multiple alignments of protein sequences
    Goldstein, P
    Karaga, M
    Kosor, M
    Nizetic, I
    Tadic, M
    Vlah, D
    Proceedings of the Conference on Applied Mathematics and Scientific Computing, 2005, : 187 - 196
  • [39] Jalview: Visualization and analysis of molecular sequences, alignments, and structures
    不详
    BMC BIOINFORMATICS, 2005, 6
  • [40] CONVERGENCE OF SEQUENCES OF DISTRIBUTIONS
    DUDLEY, RM
    PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY, 1971, 27 (03) : 531 - &