PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences

被引:281
|
作者
Mirarab, Siavash [1 ]
Nam Nguyen [1 ]
Guo, Sheng [2 ]
Wang, Li-San [3 ]
Kim, Junhyong [3 ]
Warnow, Tandy [1 ,4 ]
机构
[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
[2] Univ Penn, Genom & Computat Biol Grad Grp, Philadelphia, PA 19104 USA
[3] Univ Penn, Inst Biomed Informat, Philadelphia, PA 19104 USA
[4] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
metagenomics; phylogenetic trees; molecular evolution; algorithms; multiple alignment; INFORMATION; ACCURACY; FAMILIES; DATABASE; TREES;
D O I
10.1089/cmb.2014.0156
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATe). We also show that trees estimated on PASTA alignments are highly accurate-slightly better than SATe trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATe, highly parallelizable, and requires relatively little memory.
引用
收藏
页码:377 / 386
页数:10
相关论文
共 50 条
  • [41] AMINO-ACID SEQUENCE OF THERMOLYSIN
    TITANI, K
    WALSH, KA
    ERICSSON, LH
    NEURATH, H
    HERMODSON, MA
    [J]. NATURE-NEW BIOLOGY, 1972, 238 (80): : 35 - +
  • [42] AMINO-ACID SEQUENCE OF BACTERIORHODOPSIN
    OVCHINNIKOV, YA
    ABDULAEV, NG
    FEIGINA, MY
    LOBANOV, NA
    NASIMOV, IV
    KISELEV, AV
    [J]. BIOORGANICHESKAYA KHIMIYA, 1978, 4 (11): : 1573 - 1574
  • [43] AMINO-ACID SEQUENCE OF HUMAN MUSCLE GLYCERALDEHYDE-3-PHOSPHATE DEHYDROGENASE - ISOLATION AND AMINO-ACID SEQUENCES OF TRYPTIC PEPTIDES
    NOWAK, K
    MALARSKA, A
    OSTROPOLSKA, L
    KUCZEK, M
    ZOWMIR, O
    SLOMINSKA, A
    WOLNY, M
    BARANOWSKI, T
    [J]. ACTA BIOCHIMICA POLONICA, 1976, 23 (2-3) : 127 - 138
  • [44] AMINO-ACID SEQUENCE OF VIOMYCIN
    LECHOWSK.L
    [J]. ROCZNIKI CHEMII, 1973, 47 (04): : 773 - 777
  • [45] GASTRIN - AMINO-ACID SEQUENCE
    GRIFFIN, RA
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 1975, 293 (10): : 508 - 508
  • [46] THE AMINO-ACID SEQUENCE IN OXYTOCIN
    TUPPY, H
    [J]. BIOCHIMICA ET BIOPHYSICA ACTA, 1953, 11 (03) : 449 - 450
  • [47] NUCLEOTIDE AND DEDUCED AMINO-ACID SEQUENCE OF SHEEP ALPHA-1 ANTITRYPSIN
    BROWN, WM
    DZIEGIELEWSKA, KM
    FOREMAN, RC
    SAUNDERS, NR
    WU, Y
    [J]. NUCLEIC ACIDS RESEARCH, 1989, 17 (15) : 6398 - 6398
  • [48] NUCLEOTIDE AND DEDUCED AMINO-ACID SEQUENCE OF A GAMMA-SUBUNIT OF BOVINE FIBRINOGEN
    BROWN, WM
    DZIEGIELEWSKA, KM
    FOREMAN, RC
    SAUNDERS, NR
    [J]. NUCLEIC ACIDS RESEARCH, 1989, 17 (15) : 6397 - 6397
  • [49] STRUCTURE OF ALPHA1-ACID GLYCOPROTEIN - COMPLETE AMINO-ACID SEQUENCE, MULTIPLE AMINO-ACID SUBSTITUTIONS, AND HOMOLOGY WITH IMMUNOGLOBULINS
    SCHMID, K
    KAUFMANN, H
    ISEMURA, S
    BAUER, F
    EMURA, J
    MOTOYAMA, T
    ISHIGURO, M
    NANNO, S
    [J]. BIOCHEMISTRY, 1973, 12 (14) : 2711 - 2724
  • [50] ALIGNMENT OF AMINO-ACID AND DNA-SEQUENCES OF HUMAN PROLINE-RICH PROTEINS
    KAUFFMAN, DL
    KELLER, PJ
    BENNICK, A
    BLUM, M
    [J]. CRITICAL REVIEWS IN ORAL BIOLOGY & MEDICINE, 1993, 4 (3-4) : 287 - 292