Large-scale comparison of protein sequence alignment algorithms with structure alignments

被引:0
|
作者
Sauder, JM [1 ]
Arthur, JW [1 ]
Dunbrack, RL [1 ]
机构
[1] Fox Chase Canc Ctr, Inst Canc Res, Philadelphia, PA 19111 USA
关键词
homology modeling; PSI-BLAST; intermediate sequence; SCOP; alignment benchmark;
D O I
10.1002/(SICI)1097-0134(20000701)40:1<6::AID-PROT30>3.0.CO;2-7
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Sequence alignment programs such as BLAST and PSI-BLAST are used routinely in pairwise, profile-based, or intermediate-sequence-search (ISS) methods to detect remote homologies for the purposes of fold assignment and comparative modeling, Yet, the sequence alignment quality of these methods at low sequence identity is not known, We have used the CE structure alignment program (Shindyalov and Bourne, Prot Eng 1998;11: 739) to derive sequence alignments for all superfamily and family-level related proteins in the SCOP domain database. CE aligns structures and their sequences based on distances within each protein, rather than on interprotein distances. We compared BLAST, PSI-BLAST, CLUSTALW, and ISS alignments with the CE structural alignments. We found that global alignments with CLUSTALW were very poor at low sequence identity (<25%), as judged by the CE alignments. We used PSI-BLAST to search the nonredundant sequence database (nr) with every sequence in SCOP using up to four iterations. The resulting matrix was used to search a database of SCOP sequences. PSI-BLAST is only slightly better than BLAST in alignment accuracy on a per-residue basis, but PSI-BLAST matrix alignments are much longer than BLAST's, and so align correctly a larger fraction of the total number of aligned residues in the structure alignments. Any two SCOP sequences in the same superfamily that shared a hit or hits in the nr PSI-BLAST searches mere identified as linked by the shared intermediate sequence. We examined the quality of the longest SCOP-query/SCOP-hit alignment via an intermediate sequence, and found that ISS produced longer alignments than PSI-BLAST searches alone, of nearly comparable per-residue quality. At 10-15% sequence identity, BLAST correctly aligns 28%, PSI-BLAST 40%, and ISS 46% of residues according to the structure alignments. We also compared CE structure alignments with FSSP structure alignments generated by the DALI program. In contrast to the sequence methods, CE and structure alignments from the FSSP database identically align 75% of residue pairs at the 10-15% level of sequence identity, indicating that there is substantial room for improvement in these sequence alignment methods. BLAST produced alignments for 8% of the 10,665 nonimmunoglobulin SCOP superfamily sequence pairs (nearly all <25% sequence identity), PSI-BLAST matched 17% and the double-PSI-BLAST ISS method aligned 38% with E-values <10.0. The results indicate that intermediate sequences may be useful not only in fold assignment but also in achieving more complete sequence alignments for comparative modeling. (C) 2000 Wiley-Liss, Inc.
引用
收藏
页码:6 / 22
页数:17
相关论文
共 50 条
  • [31] Multi-GPU Approach for Large-Scale Multiple Sequence Alignment
    Siqueira, Rodrigo A. de O.
    Stefanes, Marco A.
    Rozante, Luiz C. S.
    Martins-Jr, David C.
    de Souza, Jorge E. S.
    Araujo, Eloi
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2021, PT I, 2021, 12949 : 560 - 575
  • [32] Large-Scale Comparison of Four Binding Site Detection Algorithms
    Schmidtke, Peter
    Souaille, Catherine
    Estienne, Frederic
    Baurin, Nicolas
    Kroemer, Romano T.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2010, 50 (12) : 2191 - 2200
  • [33] Comparison of Large-scale SVM Training Algorithms for Language Recognition
    Cumani, Sandro
    Castaldo, Fabio
    Laface, Pietro
    Colibro, Daniele
    Vair, Claudio
    [J]. ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, 2010, : 222 - 229
  • [34] A Theoretical and Experimental Comparison of Large-Scale Join Algorithms in Spark
    Phan A.-C.
    Phan T.-C.
    Trieu T.-N.
    Tran T.-T.-Q.
    [J]. SN Computer Science, 2021, 2 (5)
  • [35] Protein folds and families: sequence and structure alignments
    Holm, L
    Sander, C
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 244 - 247
  • [36] SparkSW: scalable distributed computing system for large-scale biological sequence alignment
    Zhao, Guoguang
    Ling, Cheng
    Sun, Donghong
    [J]. 2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 845 - 852
  • [37] The HSSP database of protein structure sequence alignments
    Schneider, R
    Sander, C
    [J]. NUCLEIC ACIDS RESEARCH, 1996, 24 (01) : 201 - 205
  • [38] Computing large-scale alignments on a multi-cluster
    Chen, CX
    Schmidt, B
    [J]. IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, PROCEEDINGS, 2003, : 38 - 45
  • [39] ALIGNMENTS OF BRIGHTEST CLUSTER GALAXIES WITH LARGE-SCALE STRUCTURES
    LAMBAS, DG
    GROTH, EJ
    PEEBLES, PJE
    [J]. ASTRONOMICAL JOURNAL, 1988, 95 (04): : 996 - 998
  • [40] The influence of large-scale structures on halo shapes and alignments
    Altay, Gabriel
    Colberg, Jorg M.
    Croft, Rupert A. C.
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2006, 370 (03) : 1422 - 1428