ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference

被引:212
|
作者
Steenwyk, Jacob L. [1 ]
Buida, Thomas J., III
Li, Yuanning [1 ]
Shen, Xing-Xing [2 ]
Rokas, Antonis [1 ]
机构
[1] Vanderbilt Univ, Dept Biol Sci, 221 Kirkland Hall, Nashville, TN 37235 USA
[2] Zhejiang Univ, Key Lab Mol Biol Crop Pathogens & Insects, Minist Agr, Inst Insect Sci, Hangzhou, Peoples R China
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
TREE; COALESCENT; PLACEMENT; SISTER; SITES; RATES; TOOL;
D O I
10.1371/journal.pbio.3001007
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Highly divergent sites in multiple sequence alignments (MSAs), which can stem from erroneous inference of homology and saturation of substitutions, are thought to negatively impact phylogenetic inference. Thus, several different trimming strategies have been developed for identifying and removing these sites prior to phylogenetic inference. However, a recent study reported that doing so can worsen inference, underscoring the need for alternative alignment trimming strategies. Here, we introduce ClipKIT, an alignment trimming software that, rather than identifying and removing putatively phylogenetically uninformative sites, instead aims to identify and retain parsimony-informative sites, which are known to be phylogenetically informative. To test the efficacy of ClipKIT, we examined the accuracy and support of phylogenies inferred from 14 different alignment trimming strategies, including those implemented in ClipKIT, across nearly 140,000 alignments from a broad sampling of evolutionary histories. Phylogenies inferred from ClipKIT-trimmed alignments are accurate, robust, and time saving. Furthermore, ClipKIT consistently outperformed other trimming methods across diverse datasets, suggesting that strategies based on identifying and retaining parsimony-informative sites provide a robust framework for alignment trimming.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] MULTIPLE SEQUENCE ALIGNMENT
    BACON, DJ
    ANDERSON, WF
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1986, 191 (02) : 153 - 161
  • [32] VCSRA: A fast and accurate multiple sequence alignment algorithm with a high degree of parallelism
    Dong Dong
    Wenhe Su
    Wenqiang Shi
    Quan Zou
    Shaoliang Peng
    [J]. Journal of Genetics and Genomics, 2018, 45 (07) : 407 - 410
  • [33] Multiple sequence alignment
    Edgar, Robert C.
    Batzoglou, Serafim
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 2006, 16 (03) : 368 - 373
  • [34] Transcriptome Ortholog Alignment Sequence Tools (TOAST) for phylogenomic dataset assembly
    Dustin J. Wcisel
    J. Thomas Howard
    Jeffrey A. Yoder
    Alex Dornburg
    [J]. BMC Evolutionary Biology, 20
  • [35] MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability
    Katoh, Kazutaka
    Standley, Daron M.
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2013, 30 (04) : 772 - 780
  • [36] Transcriptome Ortholog Alignment Sequence Tools (TOAST) for phylogenomic dataset assembly
    Wcisel, Dustin J.
    Howard, J. Thomas, III
    Yoder, Jeffrey A.
    Dornburg, Alex
    [J]. BMC EVOLUTIONARY BIOLOGY, 2020, 20 (01)
  • [37] Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization
    Markus Bauer
    Gunnar W Klau
    Knut Reinert
    [J]. BMC Bioinformatics, 8
  • [38] SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes
    Pruesse, Elmar
    Peplies, Joerg
    Gloeckner, Frank Oliver
    [J]. BIOINFORMATICS, 2012, 28 (14) : 1823 - 1829
  • [39] Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization
    Bauer, Markus
    Klau, Gunnar W.
    Reinert, Knut
    [J]. BMC BIOINFORMATICS, 2007, 8 (1)
  • [40] Accurate Inference of Tree Topologies from Multiple Sequence Alignments Using Deep Learning
    Suvorov, Anton
    Hochuli, Joshua
    Schrider, Daniel R.
    [J]. SYSTEMATIC BIOLOGY, 2020, 69 (02) : 221 - 233