TCS: A New Multiple Sequence Alignment Reliability Measure to Estimate Alignment Accuracy and Improve Phylogenetic Tree Reconstruction

被引:136
|
作者
Chang, Jia-Ming [1 ,2 ]
Di Tommaso, Paolo [1 ,2 ]
Notredame, Cedric [1 ,2 ]
机构
[1] Ctr Genom Regulat CRG, Comparat Bioinformat Bioinformat & Genom Programm, Barcelona 08003, Spain
[2] UPF, Barcelona, Spain
基金
欧洲研究理事会;
关键词
T-Coffee; multiple sequence alignment; alignment uncertainty; alignment confidence; phylogeny; homology modeling; CLUSTAL-W; PERFORMANCE; ALGORITHM; SELECTION; BLOCKS; MODELS; MUSCLE; COFFEE; MAFFT; TIME;
D O I
10.1093/molbev/msu117
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Multiple sequence alignment (MSA) is a key modeling procedure when analyzing biological sequences. Homology and evolutionary modeling are the most common applications of MSAs. Both are known to be sensitive to the underlying MSA accuracy. In this work, we show how this problem can be partly overcome using the transitive consistency score (TCS), an extended version of the T-Coffee scoring scheme. Using this local evaluation function, we show that one can identify the most reliable portions of an MSA, as judged from BAliBASE and PREFAB structure-based reference alignments. We also show how this measure can be used to improve phylogenetic tree reconstruction using both an established simulated data set and a novel empirical yeast data set. For this purpose, we describe a novel lossless alternative to site filtering that involves overweighting the trustworthy columns. Our approach relies on the T-Coffee framework; it uses libraries of pairwise alignments to evaluate any third party MSA. Pairwise projections can be produced using fast or slow methods, thus allowing a trade-off between speed and accuracy. We compared TCS with Heads-or-Tails, GUIDANCE, Gblocks, and trimAl and found it to lead to significantly better estimates of structural accuracy and more accurate phylogenetic trees. The software is available from www.tcoffee.org/Projects/tcs.
引用
收藏
页码:1625 / 1637
页数:13
相关论文
共 50 条
  • [21] MUSCLE: Multiple sequence alignment with improved accuracy and speed
    Edgar, RC
    [J]. 2004 IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE, PROCEEDINGS, 2004, : 728 - 729
  • [22] Evaluating the Accuracy and Efficiency of Multiple Sequence Alignment Methods
    Pervez, Muhammad Tariq
    Babar, Masroor Ellahi
    Nadeem, Asif
    Aslam, Muhammad
    Awan, Ali Raza
    Aslam, Naeem
    Hussain, Tanveer
    Naveed, Nasir
    Qadri, Salman
    Waheed, Usman
    Shoaib, Muhammad
    [J]. EVOLUTIONARY BIOINFORMATICS, 2014, 10
  • [23] Multiple sequence alignment accuracy and evolutionary distance estimation
    Michael S Rosenberg
    [J]. BMC Bioinformatics, 6
  • [24] Multiple sequence alignment accuracy and evolutionary distance estimation
    Rosenberg, MS
    [J]. BMC BIOINFORMATICS, 2005, 6 (1)
  • [25] Multiple Proteins Sequence Alignment Based on Progressive Methods with New Guide Tree
    Abdel-Azim, Gamil
    Ben Othman, Mohamed
    Abo-Eleneen, Z. A.
    [J]. ADVANCES IN BIOLOGY, BIOENGINEERING AND ENVIRONMENT, 2010, : 55 - +
  • [26] Multiple Sequence Alignment Averaging Improves Phylogeny Reconstruction
    Ashkenazy, Haim
    Sela, Itamar
    Karin, Eli Levy
    Landan, Giddy
    Pupko, Tal
    [J]. SYSTEMATIC BIOLOGY, 2019, 68 (01) : 117 - 130
  • [27] SATCHMO-JS']JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction
    Hagopian, Raffi
    Davidson, John R.
    Datta, Ruchira S.
    Samad, Bushra
    Jarvis, Glen R.
    Sjoelander, Kimmen
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 : W29 - W34
  • [28] Opinions on multiple sequence alignment, and an empirical comparison of repeatability and accuracy between POY and structural alignment
    Kjer, Karl M.
    Gillespie, Joseph J.
    Ober, Karen A.
    [J]. SYSTEMATIC BIOLOGY, 2007, 56 (01) : 133 - 146
  • [29] NestMSA: a new multiple sequence alignment algorithm
    Mohammed Kayed
    Ahmed A. Elngar
    [J]. The Journal of Supercomputing, 2020, 76 : 9168 - 9188
  • [30] New algorithms for multiple DNA sequence alignment
    Brown, DG
    Hudek, AK
    [J]. ALGORITHMS IN BIOINFORMATICS, PROCEEDINGS, 2004, 3240 : 314 - +