Large-scale comparison of protein sequence alignment algorithms with structure alignments

被引:0
|
作者
Sauder, JM [1 ]
Arthur, JW [1 ]
Dunbrack, RL [1 ]
机构
[1] Fox Chase Canc Ctr, Inst Canc Res, Philadelphia, PA 19111 USA
关键词
homology modeling; PSI-BLAST; intermediate sequence; SCOP; alignment benchmark;
D O I
10.1002/(SICI)1097-0134(20000701)40:1<6::AID-PROT30>3.0.CO;2-7
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Sequence alignment programs such as BLAST and PSI-BLAST are used routinely in pairwise, profile-based, or intermediate-sequence-search (ISS) methods to detect remote homologies for the purposes of fold assignment and comparative modeling, Yet, the sequence alignment quality of these methods at low sequence identity is not known, We have used the CE structure alignment program (Shindyalov and Bourne, Prot Eng 1998;11: 739) to derive sequence alignments for all superfamily and family-level related proteins in the SCOP domain database. CE aligns structures and their sequences based on distances within each protein, rather than on interprotein distances. We compared BLAST, PSI-BLAST, CLUSTALW, and ISS alignments with the CE structural alignments. We found that global alignments with CLUSTALW were very poor at low sequence identity (<25%), as judged by the CE alignments. We used PSI-BLAST to search the nonredundant sequence database (nr) with every sequence in SCOP using up to four iterations. The resulting matrix was used to search a database of SCOP sequences. PSI-BLAST is only slightly better than BLAST in alignment accuracy on a per-residue basis, but PSI-BLAST matrix alignments are much longer than BLAST's, and so align correctly a larger fraction of the total number of aligned residues in the structure alignments. Any two SCOP sequences in the same superfamily that shared a hit or hits in the nr PSI-BLAST searches mere identified as linked by the shared intermediate sequence. We examined the quality of the longest SCOP-query/SCOP-hit alignment via an intermediate sequence, and found that ISS produced longer alignments than PSI-BLAST searches alone, of nearly comparable per-residue quality. At 10-15% sequence identity, BLAST correctly aligns 28%, PSI-BLAST 40%, and ISS 46% of residues according to the structure alignments. We also compared CE structure alignments with FSSP structure alignments generated by the DALI program. In contrast to the sequence methods, CE and structure alignments from the FSSP database identically align 75% of residue pairs at the 10-15% level of sequence identity, indicating that there is substantial room for improvement in these sequence alignment methods. BLAST produced alignments for 8% of the 10,665 nonimmunoglobulin SCOP superfamily sequence pairs (nearly all <25% sequence identity), PSI-BLAST matched 17% and the double-PSI-BLAST ISS method aligned 38% with E-values <10.0. The results indicate that intermediate sequences may be useful not only in fold assignment but also in achieving more complete sequence alignments for comparative modeling. (C) 2000 Wiley-Liss, Inc.
引用
收藏
页码:6 / 22
页数:17
相关论文
共 50 条
  • [1] Fast algorithms for large-scale genome alignment and comparison
    Delcher, AL
    Phillippy, A
    Carlton, J
    Salzberg, SL
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (11) : 2478 - 2483
  • [2] Large-Scale Pairwise Sequence Alignments on a Large-Scale GPU Cluster
    Savran, Ibrahim
    Gao, Yang
    Bakos, Jason D.
    [J]. IEEE DESIGN & TEST, 2014, 31 (01) : 51 - 61
  • [3] Accelerating large-scale protein structure alignments with graphics processing units
    Bin Pang
    Nan Zhao
    Michela Becchi
    Dmitry Korkin
    Chi-Ren Shyu
    [J]. BMC Research Notes, 5 (1)
  • [4] Parallelization of MAFFT for large-scale multiple sequence alignments
    Nakamura, Tsukasa
    Yamada, Kazunori D.
    Tomii, Kentaro
    Katoh, Kazutaka
    [J]. BIOINFORMATICS, 2018, 34 (14) : 2490 - 2492
  • [5] Comparison of sequence and structure alignments for protein domains
    Marchler-Bauer, A
    Panchenko, AR
    Ariel, N
    Bryant, SH
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2002, 48 (03) : 439 - 446
  • [6] Accelerated large-scale multiple sequence alignment
    Scott Lloyd
    Quinn O Snell
    [J]. BMC Bioinformatics, 12
  • [7] Accelerated large-scale multiple sequence alignment
    Lloyd, Scott
    Snell, Quinn O.
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [8] Alignments of galaxy group shapes with large-scale structure
    Paz, Dante J.
    Sgro, Mario A.
    Merchan, Manuel
    Padilla, Nelson
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2011, 414 (03) : 2029 - 2039
  • [9] DAWN: RAPID LARGE-SCALE PROTEIN MULTIPLE SEQUENCE ALIGNMENT AND CONSERVATION ANALYSIS
    Ricke, Darrell O.
    Shcherbina, Anna
    [J]. 2015 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2015,
  • [10] Large-scale structure-informed multiple sequence alignment of proteins with SIMSApiper
    Crauwels, Charlotte
    Heidig, Sophie-Luise
    Diaz, Adrian
    Vranken, Wim F.
    [J]. BIOINFORMATICS, 2024, 40 (05)