The accuracy of several multiple sequence alignment programs for proteins

被引:102
|
作者
Nuin, Paulo A. S.
Wang, Zhouzhi
Tillier, Elisabeth R. M.
机构
[1] Univ Toronto, Ontario Canc Inst, Hlth Network, Div Canc Genom & Prote, Toronto, ON M5G 1L7, Canada
[2] Univ Toronto, Dept Med Biophys, Toronto, ON, Canada
关键词
D O I
10.1186/1471-2105-7-471
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: There have been many algorithms and software programs implemented for the inference of multiple sequence alignments of protein and DNA sequences. The "true" alignment is usually unknown due to the incomplete knowledge of the evolutionary history of the sequences, making it difficult to gauge the relative accuracy of the programs. Results: We tested nine of the most often used protein alignment programs and compared their results using sequences generated with the simulation software Simprot which creates known alignments under realistic and controlled evolutionary scenarios. We have simulated more than 30000 alignment sets using various evolutionary histories in order to define strengths and weaknesses of each program tested. We found that alignment accuracy is extremely dependent on the number of insertions and deletions in the sequences, and that indel size has a weaker effect. We also considered benchmark alignments from the latest version of BAliBASE and the results relative to BAliBASE- and Simprot-generated data sets were consistent in most cases. Conclusion: Our results indicate that employing Simprot's simulated sequences allows the creation of a more flexible and broader range of alignment classes than the usual methods for alignment accuracy assessment. Simprot also allows for a quick and efficient analysis of a wider range of possible evolutionary histories that might not be present in currently available alignment sets. Among the nine programs tested, the iterative approach available in Mafft (L-INS-i) and ProbCons were consistently the most accurate, with Mafft being the faster of the two.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] The accuracy of several multiple sequence alignment programs for proteins
    Paulo AS Nuin
    Zhouzhi Wang
    Elisabeth RM Tillier
    BMC Bioinformatics, 7
  • [2] The performance of several multiple-sequence alignment programs in relation to secondary-structure features for an rRNA sequence
    Hickson, RE
    Simon, C
    Perrey, SW
    MOLECULAR BIOLOGY AND EVOLUTION, 2000, 17 (04) : 530 - 539
  • [3] Assessing the efficiency of multiple sequence alignment programs
    Fabiano Sviatopolk-Mirsky Pais
    Patrícia de Cássia Ruy
    Guilherme Oliveira
    Roney Santos Coimbra
    Algorithms for Molecular Biology, 9
  • [4] Multiple sequence alignment with the Clustal series of programs
    Chenna, R
    Sugawara, H
    Koike, T
    Lopez, R
    Gibson, TJ
    Higgins, DG
    Thompson, JD
    NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3497 - 3500
  • [5] A comprehensive comparison of multiple sequence alignment programs
    Thompson, JD
    Plewniak, F
    Poch, O
    NUCLEIC ACIDS RESEARCH, 1999, 27 (13) : 2682 - 2690
  • [6] Assessing the efficiency of multiple sequence alignment programs
    Pais, Fabiano Sviatopolk-Mirsky
    Ruy, Patricia de Cassia
    Oliveira, Guilherme
    Coimbra, Roney Santos
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2014, 9
  • [7] Not assessing the efficiency of multiple sequence alignment programs
    Torda, Andrew E.
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2014, 9
  • [8] Not assessing the efficiency of multiple sequence alignment programs
    Andrew E Torda
    Algorithms for Molecular Biology, 9
  • [9] Multiple sequence alignment accuracy and phylogenetic inference
    Ogden, TH
    Rosenberg, MS
    SYSTEMATIC BIOLOGY, 2006, 55 (02) : 314 - 328
  • [10] SEQUENCE ALIGNMENT OF CITRATE SYNTHASE PROTEINS USING A MULTIPLE SEQUENCE ALIGNMENT ALGORITHM AND MULTIPLE SCORING MATRICES
    HENNEKE, CM
    DANSON, MJ
    HOUGH, DW
    OSGUTHORPE, DJ
    PROTEIN ENGINEERING, 1989, 2 (08): : 597 - 604