Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions

被引:60
|
作者
Torarinsson, Elfar [1 ,2 ]
Yao, Zizhen [3 ]
Wiklund, Eric D. [4 ]
Bramsen, Jesper B. [4 ]
Hansen, Claus [5 ]
Kjems, Jorgen [4 ]
Tommerup, Niels [5 ]
Ruzzo, Walter L. [3 ,6 ]
Gorodkin, Jan [1 ]
机构
[1] Univ Copenhagen, Fac Life Sci, IBVH, Sect Genet & Bioinformat, DK-1870 Frederiksberg C, Denmark
[2] Univ Copenhagen, Dept Nat Sci, Fac Life Sci, DK-1871 Frederiksberg C, Denmark
[3] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
[4] Aarhus Univ, Dept Mol Biol, DK-8000 Aarhus, Denmark
[5] Univ Copenhagen, Dept Cellular & Mol Med, Wilhelm Johannsen Ctr Funct Genome Res, DK-2200 Copenhagen, Denmark
[6] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
关键词
D O I
10.1101/gr.6887408
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure -frequent compensating base changes -is increasingly likely to cause sequence-based alignment methods to misalign, or even refuse to align, homologous ncRNAs, consequently obscuring that structural signal. We have used CMfinder, a structure-oriented local alignment tool, to search the ENCODE regions of vertebrate multiple alignments. In agreement with other studies, we find a large number of potential RNA structures in the ENCODE regions. We report 6587 candidate regions with an estimated false-positive rate of 50%. More intriguingly, many of these candidates may be better represented by alignments taking the RNA secondary structure into account than those based on primary sequence alone, often quite dramatically. For example, approximately one-quarter of our predicted motifs show revisions in > 50% of their aligned positions. Furthermore, our results are strongly complementary to those discovered by sequence-alignment-based approaches -84% of our candidates are not covered by Washietl et al., increasing the number of ncRNA candidates in the ENCODE region by 32%. In a group of 11 ncRNA candidates that were tested by RT-PCR, 10 were confirmed to be present as RNA transcripts in human tissue, and most show evidence of significant differential expression across tissues. Our results broadly suggest caution in any analysis relying on multiple sequence alignments in less well-conserved regions, clearly support growing appreciation for the biological significance of ncRNAs, and strongly support the argument for considering RNA structure directly in any searches for these elements.
引用
收藏
页码:242 / 251
页数:10
相关论文
共 50 条
  • [1] Sequence-based genomics
    Andrew JG Simpson
    [J]. Genome Biology, 3 (9):
  • [2] Sequence-based cancer genomics: Progress, lessons and opportunities
    Strausberg, RL
    Simpson, AJG
    Wooster, R
    [J]. NATURE REVIEWS GENETICS, 2003, 4 (06) : 409 - 418
  • [3] Sequence-based cancer genomics: progress, lessons and opportunities
    Robert L. Strausberg
    Andrew J. G. Simpson
    Richard Wooster
    [J]. Nature Reviews Genetics, 2003, 4 : 409 - 418
  • [4] Protein multiple alignments: sequence-based versus structure-based programs
    Carpentier, Mathilde
    Chomilier, Jacques
    [J]. BIOINFORMATICS, 2019, 35 (20) : 3970 - 3980
  • [5] Multiple Sequence Alignments Enhance Boundary Definition of RNA Structures
    Sabarinathan, Radhakrishnan
    Anthon, Christian
    Gorodkin, Jan
    Seemann, Stefan E.
    [J]. GENES, 2018, 9 (12):
  • [6] Sequence-Based Fingerprinting of Intrinsically Disordered Regions
    Ginell, Garrett M.
    Cohan, Megan C.
    Holehouse, Alex S.
    [J]. BIOPHYSICAL JOURNAL, 2019, 116 (03) : 179A - 179A
  • [7] RNA-RNA interaction prediction based on multiple sequence alignments
    Li, Andrew X.
    Marz, Manja
    Qin, Jing
    Reidys, Christian M.
    [J]. BIOINFORMATICS, 2011, 27 (04) : 456 - 463
  • [8] Hierarchical folding of multiple sequence alignments for the prediction of structures and RNA-RNA interactions
    Stefan E Seemann
    Andreas S Richter
    Jan Gorodkin
    Rolf Backofen
    [J]. Algorithms for Molecular Biology, 5
  • [9] Hierarchical folding of multiple sequence alignments for the prediction of structures and RNA-RNA interactions
    Seemann, Stefan E.
    Richter, Andreas S.
    Gorodkin, Jan
    Backofen, Rolf
    [J]. ALGORITHMS FOR MOLECULAR BIOLOGY, 2010, 5
  • [10] A Sequence-based Approach for Predicting Protein Disordered Regions
    Huang, Tao
    He, Zhi-Song
    Cui, Wei-Ren
    Cai, Yu-Dong
    Shi, Xiao-He
    Hu, Le-Le
    Chou, Kuo-Chen
    [J]. PROTEIN AND PEPTIDE LETTERS, 2013, 20 (03): : 243 - 248