Protein sequence similarity searches using patterns as seeds

被引:236
|
作者
Zhang, Z
Schaffer, AA
Miller, W
Madden, TL
Lipman, DJ
Koonin, EV
Altschul, SF [1 ]
机构
[1] NIH, Natl Ctr Biotechnol Informat, Natl Lib Med, Bethesda, MD 20894 USA
[2] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
[3] Natl Human Genome Res Inst, Inherited Dis Res Branch, NIH, Baltimore, MD 21224 USA
关键词
D O I
10.1093/nar/26.17.3986
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein families often are characterized by conserved sequence patterns or motifs, A researcher frequently wishes to evaluate the significance of a specific pattern within a protein, or to exploit knowledge of known motifs to aid the recognition of greatly diverged but homologous family members, To assist in these efforts, the pattern-hit initiated BLAST (PHI-BLAST) program described here takes as input both a protein sequence and a pattern of interest that it contains. PHI-BLAST searches a protein database for other instances of the input pattern, and uses those found as seeds for the construction of local alignments to the query sequence. The random distribution of PHI-BLAST alignment scores is studied analytically and empirically. In many instances, the program is able to detect statistically significant similarity between homologous proteins that are not recognizably related using traditional single-pass database search methods, PHI-BLAST is applied to the analysis of CED4-like cell death regulators, HS90-type ATPase domains, archaeal tRNA nucleotidyltransferases and archaeal homologs of DnaG-type DNA primases.
引用
收藏
页码:3986 / 3990
页数:5
相关论文
共 50 条
  • [1] Improving protein structure similarity searches using domain boundaries based on conserved sequence information
    Thompson, Kenneth Evan
    Wang, Yanli
    Madej, Tom
    Bryant, Stephen H.
    [J]. BMC STRUCTURAL BIOLOGY, 2009, 9
  • [2] Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches
    Yu, Yi-Kuo
    Gertz, E. Michael
    Agarwala, Richa
    Schaeffer, Alejandro A.
    Altschul, Stephen F.
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 (20) : 5966 - 5973
  • [3] Performing local similarity searches with variable length seeds
    Csürös, M
    [J]. COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2004, 3109 : 373 - 387
  • [4] Empirical statistical estimates for sequence similarity searches
    Pearson, WR
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1998, 276 (01) : 71 - 84
  • [5] Sequence similarity searches on the World Wide Web
    Brown, SM
    [J]. BIOTECHNIQUES, 1998, 24 (02) : 248 - +
  • [6] RAPID AND SENSITIVE PROTEIN SIMILARITY SEARCHES
    LIPMAN, DJ
    PEARSON, WR
    [J]. SCIENCE, 1985, 227 (4693) : 1435 - 1441
  • [7] Using SIMD Instructions to Accelerate Sequence Similarity Searches Inside a Database System
    Kadupitige, Sidath Randeni
    Rohm, Uwe
    [J]. DATABASES THEORY AND APPLICATIONS, ADC 2018, 2018, 10837 : 81 - 93
  • [8] Shotgun: getting more from sequence similarity searches
    Pegg, SCH
    Babbitt, PC
    [J]. BIOINFORMATICS, 1999, 15 (09) : 729 - 740
  • [9] Effective large-scale sequence similarity searches
    Claverie, JM
    [J]. COMPUTER METHODS FOR MACROMOLECULAR SEQUENCE ANALYSIS, 1996, 266 : 212 - 227
  • [10] Protein Sequence Similarity Analysis Using Computational Techniques
    Nikhila, K. S.
    Nair, Vrinda V.
    [J]. MATERIALS TODAY-PROCEEDINGS, 2018, 5 (01) : 724 - 731