Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega

被引:11059
|
作者
Sievers, Fabian [1 ]
Wilm, Andreas [2 ]
Dineen, David [1 ]
Gibson, Toby J. [3 ]
Karplus, Kevin [4 ]
Li, Weizhong [5 ]
Lopez, Rodrigo [5 ]
McWilliam, Hamish [5 ]
Remmert, Michael [6 ]
Soeding, Johannes [6 ]
Thompson, Julie D. [7 ]
Higgins, Desmond G. [1 ]
机构
[1] Univ Coll Dublin, UCD Conway Inst Biomol & Biomed Res, Sch Med & Med Sci, Dublin 4, Ireland
[2] Genome Inst Singapore, Singapore, Singapore
[3] European Mol Biol Lab, Struct & Computat Biol Unit, Heidelberg, Germany
[4] Univ Calif Santa Cruz, Dept Biomol Engn, Santa Cruz, CA 95064 USA
[5] European Bioinformat Inst, EMBL Outstn, Cambridge, England
[6] Univ Munich LMU, Gene Ctr Munich, Munich, Germany
[7] Univ Strasbourg, Dept Biol Struct & Genom, IGBMC, CNRS,INSERM, Illkirch Graffenstaden, France
基金
爱尔兰科学基金会;
关键词
bioinformatics; hidden Markov models; multiple sequence alignment; CONSTRUCTION; ALGORITHM; ACCURATE; COFFEE; TREES;
D O I
10.1038/msb.2011.75
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam. Molecular Systems Biology 7: 539; published online 11 October 2011; doi:10.1038/msb.2011.75
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Using CLUSTAL for multiple sequence alignments
    Higgins, DG
    Thompson, JD
    Gibson, TJ
    COMPUTER METHODS FOR MACROMOLECULAR SEQUENCE ANALYSIS, 1996, 266 : 383 - 402
  • [2] Simple chained guide trees give high-quality protein multiple sequence alignments
    Boyce, Kieran
    Sievers, Fabian
    Higgins, Desmond G.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (29) : 10556 - 10561
  • [3] Fast generation of a high-quality computer-generated hologram using a scalable and flexible PC cluster
    Song, Joongseok
    Kim, Changseob
    Park, Hanhoon
    Park, Jong-Il
    APPLIED OPTICS, 2016, 55 (13) : 3681 - 3688
  • [4] Mirage2's high-quality spliced protein-to-genome mappings produce accurate multiple-sequence alignments of isoforms
    Nord, Alexander
    Wheeler, Travis
    PLOS ONE, 2023, 18 (05):
  • [5] Relationship between multiple sequence alignments and quality of protein comparative models
    Cozzetto, D
    Tramontano, A
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 58 (01) : 151 - 157
  • [6] High-Quality Panoramic Image Generation Using Multiple PAL Images
    Shibata, Keiji
    Araki, Satoshi
    Maeda, Kei
    Horita, Yuukou
    ELECTRONICS AND COMMUNICATIONS IN JAPAN, 2014, 97 (06) : 58 - 66
  • [7] Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments
    Fox, Gearoid
    Sievers, Fabian
    Higgins, Desmond G.
    BIOINFORMATICS, 2016, 32 (06) : 814 - 820
  • [8] Fast and High-Quality Influence Maximization on Multiple GPUs
    Gokturk, Gokhan
    Kaya, Kamer
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 908 - 918
  • [9] Calculating and scoring high quality multiple flexible protein structure alignments
    Ritchie, David W.
    BIOINFORMATICS, 2016, 32 (17) : 2650 - 2658
  • [10] HiMap: Fast and Scalable High-Quality Mapping on CGRA via Hierarchical Abstraction
    Wijerathne, Dhananiaya
    Li, Zhaoying
    Pathania, Anuj
    Mitra, Tulika
    Thiele, Lothar
    PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1192 - 1197