Sigma: multiple alignment of weakly-conserved non-coding DNA sequence

被引:19
|
作者
Siddharthan, Rahul [1 ]
机构
[1] Inst Math Sci, Madras 600113, Tamil Nadu, India
关键词
D O I
10.1186/1471-2105-7-143
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch-like pairwise alignment methods. We introduce a new tool, Sigma, with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. This problem acquires importance with the increasing number of published sequences of closely-related species. In particular, studies of gene regulation seek to take advantage of comparative genomics, and recent algorithms for finding regulatory sites in phylogenetically-related intergenic sequence require alignment as a preprocessing step. Much can also be learned about evolution from intergenic DNA, which tends to evolve faster than coding DNA. Sigma uses a strategy of seeking the best possible gapless local alignments (a strategy earlier used by DiAlign), at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA. Results: Comparative tests of sigma with five earlier algorithms on synthetic data generated to mimic real data show excellent performance, with Sigma balancing high "sensitivity" (more bases aligned) with effective filtering of "incorrect" alignments. With real data, while "correctness" can't be directly quantified for the alignment, running the PhyloGibbs motif finder on pre-aligned sequence suggests that Sigma's alignments are superior. Conclusion: By taking into account the peculiarities of non-coding DNA, Sigma fills a gap in the toolbox of bioinformatics.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Genome-wide identification of coding and non-coding conserved sequence tags in human and mouse genomes
    Mignone, Flavio
    Anselmo, Anna
    Donvito, Giacinto
    Maggi, Giorgio P.
    Grillo, Giorgio
    Pesole, Graziano
    BMC GENOMICS, 2008, 9 (1)
  • [22] Genome-wide identification of coding and non-coding conserved sequence tags in human and mouse genomes
    Flavio Mignone
    Anna Anselmo
    Giacinto Donvito
    Giorgio P Maggi
    Giorgio Grillo
    Graziano Pesole
    BMC Genomics, 9
  • [23] DISRUPTION OF HIGHLY CONSERVED NON-CODING DNA ELEMENTS IN CONGENITAL HEART DEFECTS
    Krantz, I. D.
    Francey, L.
    Holst, J.
    Conlin, L.
    Spinner, N.
    Juhr, D.
    Gruber, P. J.
    PEDIATRIC RESEARCH, 2010, 68 : 151 - 151
  • [24] Comparative analysis of vertebrate Shh genes identifies novel conserved non-coding sequence
    Debbie K. Goode
    Philip K. Snell
    Greg K. Elgar
    Mammalian Genome, 2003, 14 : 192 - 201
  • [25] Comparative analysis of vertebrate Shh genes identifies novel conserved non-coding sequence
    Goode, DK
    Snell, P
    Elgar, G
    MAMMALIAN GENOME, 2003, 14 (03) : 192 - 201
  • [26] Prioritizing sequence variants in conserved non-coding elements in the chicken genome using chCADD
    Gross, Christian
    Bortoluzzi, Chiara
    de Ridder, Dick
    Megens, Hendrik-Jan
    Groenen, Martien A. M.
    Reinders, Marcel
    Bosse, Mirte
    PLOS GENETICS, 2020, 16 (09):
  • [27] Conserved non-coding sequences and transcriptional regulation
    Straehle, Uwe
    Rastegar, Sepand
    BRAIN RESEARCH BULLETIN, 2008, 75 (2-4) : 225 - 230
  • [28] Non-coding DNA adapts
    Phillips M.L.
    Genome Biology, 5 (1)
  • [29] A New Approach for Parameter Estimation in the Sequence-Structure Alignment of Non-Coding RNAs
    Song, Yinglei
    Chi, Albert Y.
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2015, 31 (02) : 597 - 607
  • [30] Fugu and human sequence comparison identifies novel human genes and conserved non-coding sequences
    Gilligan, P
    Brenner, S
    Venkatesh, B
    GENE, 2002, 294 (1-2) : 35 - 44