TCS: A New Multiple Sequence Alignment Reliability Measure to Estimate Alignment Accuracy and Improve Phylogenetic Tree Reconstruction

被引:136
|
作者
Chang, Jia-Ming [1 ,2 ]
Di Tommaso, Paolo [1 ,2 ]
Notredame, Cedric [1 ,2 ]
机构
[1] Ctr Genom Regulat CRG, Comparat Bioinformat Bioinformat & Genom Programm, Barcelona 08003, Spain
[2] UPF, Barcelona, Spain
基金
欧洲研究理事会;
关键词
T-Coffee; multiple sequence alignment; alignment uncertainty; alignment confidence; phylogeny; homology modeling; CLUSTAL-W; PERFORMANCE; ALGORITHM; SELECTION; BLOCKS; MODELS; MUSCLE; COFFEE; MAFFT; TIME;
D O I
10.1093/molbev/msu117
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Multiple sequence alignment (MSA) is a key modeling procedure when analyzing biological sequences. Homology and evolutionary modeling are the most common applications of MSAs. Both are known to be sensitive to the underlying MSA accuracy. In this work, we show how this problem can be partly overcome using the transitive consistency score (TCS), an extended version of the T-Coffee scoring scheme. Using this local evaluation function, we show that one can identify the most reliable portions of an MSA, as judged from BAliBASE and PREFAB structure-based reference alignments. We also show how this measure can be used to improve phylogenetic tree reconstruction using both an established simulated data set and a novel empirical yeast data set. For this purpose, we describe a novel lossless alternative to site filtering that involves overweighting the trustworthy columns. Our approach relies on the T-Coffee framework; it uses libraries of pairwise alignments to evaluate any third party MSA. Pairwise projections can be produced using fast or slow methods, thus allowing a trade-off between speed and accuracy. We compared TCS with Heads-or-Tails, GUIDANCE, Gblocks, and trimAl and found it to lead to significantly better estimates of structural accuracy and more accurate phylogenetic trees. The software is available from www.tcoffee.org/Projects/tcs.
引用
收藏
页码:1625 / 1637
页数:13
相关论文
共 50 条
  • [31] A NEW GENETIC ALGORITHM FOR MULTIPLE SEQUENCE ALIGNMENT
    Narimani, Zahra
    Beigy, Hamid
    Abolhassani, Hassan
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2012, 11 (04)
  • [32] New formulations of the multiple sequence alignment problem
    Thiru S. Arthanari
    Hoai An Le Thi
    [J]. Optimization Letters, 2011, 5 : 27 - 40
  • [33] NestMSA: a new multiple sequence alignment algorithm
    Kayed, Mohammed
    Elngar, Ahmed A.
    [J]. JOURNAL OF SUPERCOMPUTING, 2020, 76 (11): : 9168 - 9188
  • [34] New flexible approaches for multiple sequence alignment
    Shibuya, T
    Imai, H
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (03) : 385 - 413
  • [35] MSAID: multiple sequence alignment based on a measure of information discrepancy
    Zhang, M
    Fang, WW
    Zhang, JH
    Chi, ZX
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2005, 29 (02) : 175 - 181
  • [36] New formulations of the multiple sequence alignment problem
    Arthanari, Thiru S.
    Hoai An Le Thi
    [J]. OPTIMIZATION LETTERS, 2011, 5 (01) : 27 - 40
  • [37] A New Progressive Multiple Sequence Alignment Algorithm
    Hosni, Soumaya
    Mokaddem, Ahmed
    Elloumi, Mourad
    [J]. 2012 23RD INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2012, : 195 - 198
  • [38] The iRMSD: a local measure of sequence alignment accuracy using structural information
    Armougom, Fabrice
    Moretti, Sebastien
    Keduas, Vladimir
    Notredame, Cedric
    [J]. BIOINFORMATICS, 2006, 22 (14) : E35 - E39
  • [39] Load balancing and parallel multiple sequence alignment with tree accumulation
    Tan, Guangming
    Peng, Liu
    Feng, Shengzhong
    Sun, Ninghui
    [J]. EURO-PAR 2006 PARALLEL PROCESSING, 2006, 4128 : 1138 - 1147
  • [40] MAFFT version 5: improvement in accuracy of multiple sequence alignment
    Katoh, K
    Kuma, K
    Toh, H
    Miyata, T
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 (02) : 511 - 518