NcDNAlign:: Plausible multiple alignments of non-protein-coding genomic sequences

被引:8
|
作者
Rose, Dominic
Hertel, Jana
Reiche, Kristin [1 ]
Stadler, Peter F. [1 ,2 ,3 ]
Hackermueller, Joerg [1 ]
机构
[1] Fraunhofer Inst Cell Therapy & Immunol IZI, D-04103 Leipzig, Germany
[2] Univ Vienna, Dept Theoret Chem, A-1090 Vienna, Austria
[3] Santa Fe Inst, Santa Fe, NM 87501 USA
关键词
non-coding RNA; ncRNA; alignment; multiple sequence alignments; ultra-conserved elements; ultra-conserved regions; UCE; UCR; CNE; genome annotation; comparative genomics;
D O I
10.1016/j.ygeno.2008.04.003
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Genome-wide multiple sequence alignments (MSAs) are a necessary prerequisite for an increasingly diverse collection of comparative genomic approaches. Here we present a versatile method that generates high-quality MSAS for non-protein-coding sequences. The NcDNAlign pipeline combines pairwise BLAST alignments to create initial MSAs, which are then locally improved and trimmed. The program is optimized for speed and hence is particulary well-suited to pilot studies. We demonstrate the practical use of NcDNAlign in three case studies: the search for ncRNAs in gammaproteobacteria and the analysis of conserved noncoding DNA in nematodes and teleost fish, in the latter case focusing on the fate of duplicated ultra-conserved regions. Compared to the currently widely used genome-wide alignment program TBA, our program results in a 20- to 30-fold reduction of CPU time necessary to generate gamma proteobacterial alignments. A showcase application of bacterial ncRNA prediction based on alignments of both algorithms results in similar sensitivity, false discovery rates,and up to 100 putatively novel ncRNA structures. Similar findings hold for our application Of NcDNAlign to the identification of ultra-conserved regions in nematodes and teleosts. Both approaches yield conserved sequences of unknown function, result in novel evolutionary insights into conservation patterns among these genomes, and manifest the benefits of an efficient and reliable genome-wide alignment package. The software is available under the GNU Public License at http://www.bioinf.uni-leipzig.de/Software/NcDNAlign/. (C) 2008 Elsevier Inc. All rights reserved.
引用
收藏
页码:65 / 74
页数:10
相关论文
共 50 条
  • [1] Pathogenic variants in non-protein-coding sequences
    Makrythanasis, P.
    Antonarakis, S. E.
    CLINICAL GENETICS, 2013, 84 (05) : 422 - 428
  • [2] Towards realistic benchmarks for multiple alignments of non-coding sequences
    Jaebum Kim
    Saurabh Sinha
    BMC Bioinformatics, 11
  • [3] Towards realistic benchmarks for multiple alignments of non-coding sequences
    Kim, Jaebum
    Sinha, Saurabh
    BMC BIOINFORMATICS, 2010, 11
  • [4] Expression of non-protein-coding antisense RNAs in genomic regions related to autism spectrum disorders
    Dmitry Velmeshev
    Marco Magistri
    Mohammad Ali Faghihi
    Molecular Autism, 4
  • [5] Expression of non-protein-coding antisense RNAs in genomic regions related to autism spectrum disorders
    Velmeshev, Dmitry
    Magistri, Marco
    Faghihi, Mohammad Ali
    MOLECULAR AUTISM, 2013, 4
  • [6] Identification of a Conserved Non-Protein-Coding Genomic Element that Plays an Essential Role in Alphabaculovirus Pathogenesis
    Kikhno, Irina
    PLOS ONE, 2014, 9 (04):
  • [7] Profiling of epididymal small non-protein-coding RNAs
    Nixon, B.
    De Iuliis, G. N.
    Dun, M. D.
    Zhou, W.
    Trigg, N. A.
    Eamens, A. L.
    ANDROLOGY, 2019, 7 (05) : 669 - 680
  • [8] Multiple sequence alignments of partially coding nucleic acid sequences
    Roman R Stocsits
    Ivo L Hofacker
    Claudia Fried
    Peter F Stadler
    BMC Bioinformatics, 6
  • [9] Multiple sequence alignments of partially coding nucleic acid sequences
    Stocsits, RR
    Hofacker, IL
    Fried, C
    Stadler, PF
    BMC BIOINFORMATICS, 2005, 6 (1)
  • [10] The relationship between non-protein-coding DNA and eukaryotic complexity
    Taft, Ryan J.
    Pheasant, Michael
    Mattick, John S.
    BIOESSAYS, 2007, 29 (03) : 288 - 299