PASTA: Ultra-Large Multiple Sequence Alignment

被引:0
|
作者
Mirarab, Siavash [1 ]
Nguyen, Nam [1 ]
Warnow, Tandy [1 ]
机构
[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
关键词
Multiple sequence alignment; Ultra-large; SATe; MAXIMUM-LIKELIHOOD; TIME; ACCURACY; MUSCLE; TREES; MAFFT;
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In this paper, we introduce a new and highly scalable algorithm, PASTA, for large-scale multiple sequence alignment estimation. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy of the leading alignment methods on large datasets, and is able to analyze much larger datasets than the current methods. We also show that trees estimated on PASTA alignments are highly accurate - slightly better than SATe trees, but with substantial improvements relative to other methods. Finally, PASTA is very fast, highly parallelizable, and requires relatively little memory.
引用
收藏
页码:177 / 191
页数:15
相关论文
共 50 条
  • [1] PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences
    Mirarab, Siavash
    Nam Nguyen
    Guo, Sheng
    Wang, Li-San
    Kim, Junhyong
    Warnow, Tandy
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2015, 22 (05) : 377 - 386
  • [2] HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing
    Wan, Shixiang
    Zou, Quan
    [J]. ALGORITHMS FOR MOLECULAR BIOLOGY, 2017, 12
  • [3] HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing
    Shixiang Wan
    Quan Zou
    [J]. Algorithms for Molecular Biology, 12
  • [4] Reducing Alignment Time Complexity of Ultra-Large Sets of Sequences
    Rubio-Largo, Alvaro
    Vanneschi, Leonardo
    Castelli, Mauro
    Vega-Rodriguez, Miguel A.
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2017, 24 (11) : 1144 - 1152
  • [5] HAlign 3: Fast Multiple Alignment of Ultra-Large Numbers of Similar DNA/RNA Sequences
    Tang, Furong
    Chao, Jiannan
    Wei, Yanming
    Yang, Fenglong
    Zhai, Yixiao
    Xu, Lei
    Zou, Quan
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2022, 39 (08)
  • [6] SaAlign: Multiple DNA/RNA sequence alignment and phylogenetic tree construction tool for ultra-large datasets and ultra-long sequences based on suffix array
    Wang, Ziyuan
    Tan, Junjie
    Long, Yanling
    Liu, Yijia
    Lei, Wenyan
    Cai, Jing
    Yang, Yi
    Liu, Zhibin
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 1487 - 1493
  • [7] ULTRA-LARGE SCALE INTEGRATION
    MEINDL, JD
    [J]. IEEE TRANSACTIONS ON ELECTRON DEVICES, 1984, 31 (11) : 1555 - 1561
  • [8] Screening ultra-large virtual libraries
    Crunkhorn, Sarah
    [J]. NATURE REVIEWS DRUG DISCOVERY, 2022, 21 (02) : 95 - 95
  • [9] EXPERIMENTS WITH ULTRA-LARGE CORNEAL GRAFTS
    DRAEGER, J
    KOHLER, L
    [J]. OPHTHALMIC RESEARCH, 1981, 13 (05) : 263 - 263
  • [10] Ultra-Large Scale Control Architecture
    Taft, Jeffrey
    De Martini, Paul
    [J]. 2013 IEEE PES INNOVATIVE SMART GRID TECHNOLOGIES (ISGT), 2013,