Reconstructing Ancestral Genomic Sequences by Co-Evolution: Formal Definitions, Computational Issues, and Biological Examples

被引:1
|
作者
Tuller, Tamir [1 ,2 ]
Birin, Hadas [3 ]
Kupiec, Martin [4 ]
Ruppin, Eytan [3 ,5 ]
机构
[1] Weizmann Inst Sci, Fac Math & Comp Sci, IL-76100 Rehovot, Israel
[2] Weizmann Inst Sci, Dept Mol Genet, IL-76100 Rehovot, Israel
[3] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel
[4] Tel Aviv Univ, Dept Mol Microbiol & Biotechnol, IL-69978 Tel Aviv, Israel
[5] Tel Aviv Univ, Sch Med, IL-69978 Tel Aviv, Israel
关键词
Co-evolution; maximum likelihood; maximum parsimony; reconstruction of ancestral genomes; SACCHAROMYCES-CEREVISIAE; SECONDARY STRUCTURE; MAXIMUM-LIKELIHOOD; EVOLUTION; TREE; PREDICTION; GENE; INFORMATION; NUCLEOTIDE; INFERENCE;
D O I
10.1089/cmb.2010.0112
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The inference of ancestral genomes is a fundamental problem in molecular evolution. Due to the statistical nature of this problem, the most likely or the most parsimonious ancestral genomes usually include considerable error rates. In general, these errors cannot be abolished by utilizing more exhaustive computational approaches, by using longer genomic sequences, or by analyzing more taxa. In recent studies, we showed that co-evolution is an important force that can be used for significantly improving the inference of ancestral genome content. In this work we formally define a computational problem for the inference of ancestral genome content by co-evolution. We show that this problem is NP-hard and hard to approximate and present both a Fixed Parameter Tractable (FPT) algorithm, and heuristic approximation algorithms for solving it. The running time of these algorithms on simulated inputs with hundreds of protein families and hundreds of co-evolutionary relations was fast (up to four minutes) and it achieved an approximation ratio of <1.3. We use our approach to study the ancestral genome content of the Fungi. To this end, we implement our approach on a dataset of 33, 931 protein families and 20, 317 co-evolutionary relations. Our algorithm added and removed hundreds of proteins from the ancestral genomes inferred by maximum likelihood (ML) or maximum parsimony (MP) while slightly affecting the likelihood/parsimony score of the results. A biological analysis revealed various pieces of evidence that support the biological plausibility of the new solutions. In addition, we showed that our approach reconstructs missing values at the leaves of the Fungi evolutionary tree better than ML or MP.
引用
收藏
页码:1327 / 1344
页数:18
相关论文
共 8 条
  • [1] Co-evolutionary Models for Reconstructing Ancestral Genomic Sequences: Computational Issues and Biological Examples
    Tuller, Tamir
    Birin, Hadas
    Kupiec, Martin
    Ruppin, Eytan
    [J]. COMPARATIVE GENOMICS, PROCEEDINGS, 2009, 5817 : 164 - +
  • [2] Discovering local patterns of co-evolution: computational aspects and biological examples
    Tuller, Tamir
    Felder, Yifat
    Kupiec, Martin
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [3] Co-evolution and Information Signals in Biological Sequences
    Carbone, Alessandra
    Dib, Linda
    [J]. THEORY AND APPLICATIONS OF MODELS OF COMPUTATION, 2009, 5532 : 6 - +
  • [4] Co-evolution and information signals in biological sequences
    Carbone, A.
    Dib, L.
    [J]. THEORETICAL COMPUTER SCIENCE, 2011, 412 (23) : 2486 - 2495
  • [5] Discovering local patterns of co - evolution: computational aspects and biological examples
    Tamir Tuller
    Yifat Felder
    Martin Kupiec
    [J]. BMC Bioinformatics, 11
  • [6] DCAlign v1.0: aligning biological sequences using co-evolution models and informed priors
    Muntoni, Anna Paola
    Pagnani, Andrea
    [J]. BIOINFORMATICS, 2023, 39 (09)
  • [7] Large-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences
    Anna G. Green
    Hadeer Elhabashy
    Kelly P. Brock
    Rohan Maddamsetti
    Oliver Kohlbacher
    Debora S. Marks
    [J]. Nature Communications, 12
  • [8] Large-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences
    Green, Anna G.
    Elhabashy, Hadeer
    Brock, Kelly P.
    Maddamsetti, Rohan
    Kohlbacher, Oliver
    Marks, Debora S.
    [J]. NATURE COMMUNICATIONS, 2021, 12 (01)