Genome-wide nucleotide-level mammalian ancestor reconstruction

被引:125
|
作者
Paten, Benedict [1 ]
Herrero, Javier [2 ]
Fitzgerald, Stephen [2 ]
Beal, Kathryn [2 ]
Flicek, Paul [2 ]
Holmes, Ian [3 ]
Birney, Ewan [2 ]
机构
[1] Univ Calif Santa Cruz, Ctr Biomol Sci & Engn, Santa Cruz, CA 95064 USA
[2] EMBL European Bioinformat Inst, Cambridge CB10 1SD, England
[3] Univ Calif Berkeley, Dept Bioengn, Berkeley, CA 94720 USA
关键词
D O I
10.1101/gr.076521.108
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Recently attention has been turned to the problem of reconstructing complete ancestral sequences from large multiple alignments. Successful generation of these genome-wide reconstructions will facilitate a greater knowledge of the events that have driven evolution. We present a new evolutionary alignment modeler, called "Ortheus," for inferring the evolutionary history of a multiple alignment, in terms of both substitutions and, importantly, insertions and deletions. Based on a multiple sequence probabilistic transducer model of the type proposed by Holmes, Ortheus uses efficient stochastic graph-based dynamic programming methods. Unlike other methods, Ortheus does not rely on a single fixed alignment from which to work. Ortheus is also more scaleable than previous methods while being fast, stable, and open source. Large-scale simulations show that Ortheus performs close to optimally on a deep mammalian phylogeny. Simulations also indicate that significant proportions of errors due to insertions and deletions can be avoided by not assuming a fixed alignment. We additionally use a challenging hold-out cross-validation procedure to test the method; using the reconstructions to predict extant sequence bases, we demonstrate significant improvements over using closest extant neighbor sequences. Accompanying this paper, a new, public, and genome-wide set of Ortheus ancestor alignments provide an intriguing new resource for evolutionary studies in mammals. As a first piece of analysis, we attempt to recover "fossilized" ancestral pseudogenes. We confidently find 31 cases in which the ancestral sequence had a more complete sequence than any of the extant sequences.
引用
收藏
页码:1829 / 1843
页数:15
相关论文
共 50 条
  • [1] Genome-Wide Analysis of Nucleotide-Level Variation in Commonly Used Saccharomyces cerevisiae Strains
    Schacherer, Joseph
    Ruderfer, Douglas M.
    Gresham, David
    Dolinski, Kara
    Botstein, David
    Kruglyak, Leonid
    [J]. PLOS ONE, 2007, 2 (03):
  • [2] Identification of genome-wide nucleotide sites associated with mammalian virulence in influenza A viruses
    Peng, Yousong
    Zhu, Wenfei
    Feng, Zhaomin
    Zhu, Zhaozhong
    Zhang, Zheng
    Chen, Yongkun
    Liu, Suli
    Wu, Aiping
    Wang, Dayan
    Shu, Yuelong
    Jiang, Taijiao
    [J]. BIOSAFETY AND HEALTH, 2020, 2 (01) : 32 - 38
  • [3] Identification of genome-wide nucleotide sites associated with mammalian virulence in influenza A viruses
    Peng Yousong
    Zhu Wenfei
    Feng Zhaomin
    Zhu Zhaozhong
    Zhang Zheng
    Chen Yongkun
    Liu Suli
    Wu Aiping
    Wang Dayan
    Shu Yuelong
    Jiang Taijiao
    [J]. 生物安全与健康(英文), 2020, 02 (01) : 32 - 38
  • [4] Identification of Nucleotide-Level Changes Impacting Gene Content and Genome Evolution in Orthopoxviruses
    Hatcher, Eneida L.
    Hendrickson, Robert Curtis
    Lefkowitz, Elliot J.
    [J]. JOURNAL OF VIROLOGY, 2014, 88 (23) : 13651 - 13668
  • [5] Base-By-Base: Single nucleotide-level analysis of whole viral genome alignments
    Ryan Brodie
    Alex J Smith
    Rachel L Roper
    Vasily Tcherepanov
    Chris Upton
    [J]. BMC Bioinformatics, 5
  • [6] Base-By-Base: Single nucleotide-level analysis of whole viral genome alignments
    Brodie, R
    Smith, AJ
    Roper, RL
    Tcherepanov, V
    Upton, C
    [J]. BMC BIOINFORMATICS, 2004, 5 (1)
  • [7] Evolutionary determinants of genome-wide nucleotide composition
    Hongan Long
    Way Sung
    Sibel Kucukyildirim
    Emily Williams
    Samuel F. Miller
    Wanfeng Guo
    Caitlyn Patterson
    Colin Gregory
    Chloe Strauss
    Casey Stone
    Cécile Berne
    David Kysela
    William R. Shoemaker
    Mario E. Muscarella
    Haiwei Luo
    Jay T. Lennon
    Yves V. Brun
    Michael Lynch
    [J]. Nature Ecology & Evolution, 2018, 2 : 237 - 240
  • [8] Evolutionary determinants of genome-wide nucleotide composition
    Long, Hongan
    Sung, Way
    Kucukyildirim, Sibel
    Williams, Emily
    Miller, Samuel F.
    Guo, Wanfeng
    Patterson, Caitlyn
    Gregory, Colin
    Strauss, Chloe
    Stone, Casey
    Berne, Cecile
    Kysela, David
    Shoemaker, William R.
    Muscarella, Mario E.
    Luo, Haiwei
    Lennon, Jay T.
    Brun, Yves V.
    Lynch, Michael
    [J]. NATURE ECOLOGY & EVOLUTION, 2018, 2 (02): : 237 - +
  • [9] DNA Rereplication Is Susceptible to Nucleotide-Level Mutagenesis
    Bui, Duyen T.
    Li, Joachim J.
    [J]. GENETICS, 2019, 212 (02) : 445 - 460
  • [10] Nucleotide-level linkage of transcriptional elongation and polyadenylation
    Geisberg, Joseph, V
    Moqtaderi, Zarmik
    Fong, Nova
    Erickson, Benjamin
    Bentley, David L.
    Struhl, Kevin
    [J]. ELIFE, 2022, 11