Large-scale comparison of protein sequence alignment algorithms with structure alignments

被引:0
|
作者
Sauder, JM [1 ]
Arthur, JW [1 ]
Dunbrack, RL [1 ]
机构
[1] Fox Chase Canc Ctr, Inst Canc Res, Philadelphia, PA 19111 USA
关键词
homology modeling; PSI-BLAST; intermediate sequence; SCOP; alignment benchmark;
D O I
10.1002/(SICI)1097-0134(20000701)40:1<6::AID-PROT30>3.0.CO;2-7
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Sequence alignment programs such as BLAST and PSI-BLAST are used routinely in pairwise, profile-based, or intermediate-sequence-search (ISS) methods to detect remote homologies for the purposes of fold assignment and comparative modeling, Yet, the sequence alignment quality of these methods at low sequence identity is not known, We have used the CE structure alignment program (Shindyalov and Bourne, Prot Eng 1998;11: 739) to derive sequence alignments for all superfamily and family-level related proteins in the SCOP domain database. CE aligns structures and their sequences based on distances within each protein, rather than on interprotein distances. We compared BLAST, PSI-BLAST, CLUSTALW, and ISS alignments with the CE structural alignments. We found that global alignments with CLUSTALW were very poor at low sequence identity (<25%), as judged by the CE alignments. We used PSI-BLAST to search the nonredundant sequence database (nr) with every sequence in SCOP using up to four iterations. The resulting matrix was used to search a database of SCOP sequences. PSI-BLAST is only slightly better than BLAST in alignment accuracy on a per-residue basis, but PSI-BLAST matrix alignments are much longer than BLAST's, and so align correctly a larger fraction of the total number of aligned residues in the structure alignments. Any two SCOP sequences in the same superfamily that shared a hit or hits in the nr PSI-BLAST searches mere identified as linked by the shared intermediate sequence. We examined the quality of the longest SCOP-query/SCOP-hit alignment via an intermediate sequence, and found that ISS produced longer alignments than PSI-BLAST searches alone, of nearly comparable per-residue quality. At 10-15% sequence identity, BLAST correctly aligns 28%, PSI-BLAST 40%, and ISS 46% of residues according to the structure alignments. We also compared CE structure alignments with FSSP structure alignments generated by the DALI program. In contrast to the sequence methods, CE and structure alignments from the FSSP database identically align 75% of residue pairs at the 10-15% level of sequence identity, indicating that there is substantial room for improvement in these sequence alignment methods. BLAST produced alignments for 8% of the 10,665 nonimmunoglobulin SCOP superfamily sequence pairs (nearly all <25% sequence identity), PSI-BLAST matched 17% and the double-PSI-BLAST ISS method aligned 38% with E-values <10.0. The results indicate that intermediate sequences may be useful not only in fold assignment but also in achieving more complete sequence alignments for comparative modeling. (C) 2000 Wiley-Liss, Inc.
引用
收藏
页码:6 / 22
页数:17
相关论文
共 50 条
  • [21] Sequence progressive alignment, a framework for practical large-scale probabilistic consistency alignment
    Paten, Benedict
    Herrero, Javier
    Beal, Kathryn
    Birney, Ewan
    [J]. BIOINFORMATICS, 2009, 25 (03) : 295 - 301
  • [22] Multiple sequence alignment: a major challenge to large-scale phylogenetics
    Liu, Kevin
    Linder, C. Randal
    Warnow, Tandy
    [J]. PLOS CURRENTS-TREE OF LIFE, 2010,
  • [23] Parallel Linear Space algorithm for large-scale sequence alignment
    Li, E
    Xu, C
    Wang, T
    Jin, L
    Zhang, YM
    [J]. EURO-PAR 2005 PARALLEL PROCESSING, PROCEEDINGS, 2005, 3648 : 1207 - 1216
  • [24] The mass dependence of dark matter halo alignments with large-scale structure
    Piras, Davide
    Joachimi, Benjamin
    Schaefer, Bjorn Malte
    Bonamigo, Mario
    Hilbert, Stefan
    van Uitert, Edo
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2018, 474 (01) : 1165 - 1175
  • [25] A comparison of sequence alignment algorithms for measuring secondary structure similarity
    Volkert, LG
    Stoffer, DA
    [J]. PROCEEDINGS OF THE 2004 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2004, : 182 - 189
  • [26] Insights into protein function through large-scale computational analysis of sequence and structure
    Weir, M
    Swindells, M
    Overington, J
    [J]. TRENDS IN BIOTECHNOLOGY, 2001, 19 (10) : S61 - S66
  • [27] Large-Scale Multiple Sequence Alignment and the Maximum Weight Trace Alignment Merging Problem
    Zaharias, Paul
    Smirnov, Vladimir
    Warnow, Tandy
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (03) : 1700 - 1712
  • [28] A Distributed CPU-GPU Framework for Pairwise Alignments on Large-Scale Sequence Datasets
    Li, Da
    Sajjapongse, Kittisak
    Huan Truong
    Conant, Gavin
    Becchi, Michela
    [J]. PROCEEDINGS OF THE 2013 IEEE 24TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 13), 2013, : 329 - 338
  • [29] Large-scale alignments from WMAP and Planck
    Copi, Craig J.
    Huterer, Dragan
    Schwarz, Dominik J.
    Starkman, Glenn D.
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2015, 449 (04) : 3458 - 3470
  • [30] Large-scale biological sequence assembly and alignment by using computing grid
    Shi, W
    Zhou, WL
    [J]. GRID AND COOPERATIVE COMPUTING, PT 1, 2004, 3032 : 26 - 33