Sequence embedding for fast construction of guide trees for multiple sequence alignment

被引:68
|
作者
Blackshields, Gordon [1 ]
Sievers, Fabian [1 ]
Shi, Weifeng [1 ]
Wilm, Andreas [1 ]
Higgins, Desmond G. [1 ]
机构
[1] Univ Coll Dublin, UCD Conway Inst Biomol & Biomed Sci, Dublin 4, Ireland
基金
爱尔兰科学基金会;
关键词
CLUSTAL-W; DATABASE; MAFFT; ACID;
D O I
10.1186/1748-7188-5-21
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N-2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments. Results: In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances. Conclusions: We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from http://www.clustal.org/mbed.tgz.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Sequence embedding for fast construction of guide trees for multiple sequence alignment
    Gordon Blackshields
    Fabian Sievers
    Weifeng Shi
    Andreas Wilm
    Desmond G Higgins
    [J]. Algorithms for Molecular Biology, 5
  • [2] Improving multiple sequence alignment by using better guide trees
    Qing Zhan
    Yongtao Ye
    Tak-Wah Lam
    Siu-Ming Yiu
    Yadong Wang
    Hing-Fung Ting
    [J]. BMC Bioinformatics, 16
  • [3] Improving multiple sequence alignment by using better guide trees
    Zhan, Qing
    Ye, Yongtao
    Lam, Tak-Wah
    Yiu, Siu-Ming
    Wang, Yadong
    Ting, Hing-Fung
    [J]. BMC BIOINFORMATICS, 2015, 16
  • [4] Multiple Guide Trees in a Tabu Search Algorithm for the Multiple Sequence Alignment Problem
    Mehenni, Tahar
    [J]. COMPUTER SCIENCE AND ITS APPLICATIONS, CIIA 2015, 2015, 456 : 141 - 152
  • [5] PnpProbs: Better Multiple Sequence Alignment by Better Handling of Guide Trees
    Ye, Yongtao
    Lam, Tak-Wah
    Ting, Hing-Fung
    [J]. BIOINFORMATICS RESEARCH AND APPLICATIONS (ISBRA 2015), 2015, 9096 : 424 - 426
  • [6] PnpProbs: a better multiple sequence alignment tool by better handling of guide trees
    Yongtao Ye
    Tak-Wah Lam
    Hing-Fung Ting
    [J]. BMC Bioinformatics, 17
  • [7] PnpProbs: a better multiple sequence alignment tool by better handling of guide trees
    Ye, Yongtao
    Lam, Tak-Wah
    Ting, Hing-Fung
    [J]. BMC BIOINFORMATICS, 2016, 17
  • [8] TREES, STARS, AND MULTIPLE BIOLOGICAL SEQUENCE ALIGNMENT
    ALTSCHUL, SF
    LIPMAN, DJ
    [J]. SIAM JOURNAL ON APPLIED MATHEMATICS, 1989, 49 (01) : 197 - 209
  • [9] On the use of suffix trees in multiple sequence alignment
    Sharma, Kai Renganathan
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2005, 230 : U1326 - U1326
  • [10] MAUSA: Using simulated annealing for guide tree construction in multiple sequence alignment
    Uren, P. J.
    Cameron-Jones, R. M.
    Sale, A. H. J.
    [J]. AI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4830 : 599 - 608