ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference

被引:212
|
作者
Steenwyk, Jacob L. [1 ]
Buida, Thomas J., III
Li, Yuanning [1 ]
Shen, Xing-Xing [2 ]
Rokas, Antonis [1 ]
机构
[1] Vanderbilt Univ, Dept Biol Sci, 221 Kirkland Hall, Nashville, TN 37235 USA
[2] Zhejiang Univ, Key Lab Mol Biol Crop Pathogens & Insects, Minist Agr, Inst Insect Sci, Hangzhou, Peoples R China
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
TREE; COALESCENT; PLACEMENT; SISTER; SITES; RATES; TOOL;
D O I
10.1371/journal.pbio.3001007
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Highly divergent sites in multiple sequence alignments (MSAs), which can stem from erroneous inference of homology and saturation of substitutions, are thought to negatively impact phylogenetic inference. Thus, several different trimming strategies have been developed for identifying and removing these sites prior to phylogenetic inference. However, a recent study reported that doing so can worsen inference, underscoring the need for alternative alignment trimming strategies. Here, we introduce ClipKIT, an alignment trimming software that, rather than identifying and removing putatively phylogenetically uninformative sites, instead aims to identify and retain parsimony-informative sites, which are known to be phylogenetically informative. To test the efficacy of ClipKIT, we examined the accuracy and support of phylogenies inferred from 14 different alignment trimming strategies, including those implemented in ClipKIT, across nearly 140,000 alignments from a broad sampling of evolutionary histories. Phylogenies inferred from ClipKIT-trimmed alignments are accurate, robust, and time saving. Furthermore, ClipKIT consistently outperformed other trimming methods across diverse datasets, suggesting that strategies based on identifying and retaining parsimony-informative sites provide a robust framework for alignment trimming.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Is multiple-sequence alignment required for accurate inference of phylogeny?
    Hohl, Michael
    Ragan, Mark A.
    [J]. SYSTEMATIC BIOLOGY, 2007, 56 (02) : 206 - 221
  • [2] Multiple sequence alignment accuracy and phylogenetic inference
    Ogden, TH
    Rosenberg, MS
    [J]. SYSTEMATIC BIOLOGY, 2006, 55 (02) : 314 - 328
  • [3] An accurate algorithm for multiple sequence alignment in MapReduce
    Dong, Gaifang
    Fu, Xueliang
    Li, Honghui
    Li, Jianrong
    [J]. JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2018, 18 (01) : 283 - 295
  • [4] Do Alignment and Trimming Methods Matter for Phylogenomic (UCE) Analyses?
    Portik, Daniel M.
    Wiens, John J.
    [J]. SYSTEMATIC BIOLOGY, 2021, 70 (03) : 440 - 462
  • [5] Multiobjective Formulation of Multiple Sequence Alignment for Phylogeny Inference
    Nayeem, Muhammad Ali
    Bayzid, Md Shamsuzzoha
    Rahman, Atif Hasan
    Shahriyar, Rifat
    Rahman, M. Sohel
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (05) : 2775 - 2786
  • [6] A simple, fast, and accurate method of phylogenomic inference
    Martin Wu
    Jonathan A Eisen
    [J]. Genome Biology, 9
  • [7] A simple, fast, and accurate method of phylogenomic inference
    Wu, Martin
    Eisen, Jonathan A.
    [J]. GENOME BIOLOGY, 2008, 9 (10)
  • [8] Recursive MAGUS: Scalable and accurate multiple sequence alignment
    Smirnov, Vladimir
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (10)
  • [9] Kalign – an accurate and fast multiple sequence alignment algorithm
    Timo Lassmann
    Erik LL Sonnhammer
    [J]. BMC Bioinformatics, 6
  • [10] Kalign - an accurate and fast multiple sequence alignment algorithm
    Lassmann, T
    Sonnhammer, ELL
    [J]. BMC BIOINFORMATICS, 2005, 6 (1)