Prediction of the secondary structures of proteins by using PREDICT, a nearest neighbor method on pattern space

被引:0
|
作者
Joo, K [1 ]
Kim, I
Kim, SY
Lee, J
Lee, J
Lee, SJ
机构
[1] Korea Inst Adv Study, Sch Computat Sci, Seoul 130650, South Korea
[2] Soongsil Univ, Dept Bioinformat & Life Sci, Seoul, South Korea
[3] Soongsil Univ, Bioinformat & Mol Design Technol Innovat Ctr, Seoul, South Korea
[4] Soongsil Univ, Comp Aided Mol Design Res Ctr, Seoul, South Korea
[5] Univ Suwon, Dept Phys, Suwon 445890, South Korea
[6] Univ Suwon, Ctr Smart Biomat, Suwon 445890, South Korea
关键词
protein structure prediction; secondary structure prediction;
D O I
暂无
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
We introduce a novel method for predicting the secondary structure of proteins, PREDICT (PRofile Enumeration DICTionary), in which the nearest-neighbor method is applied to a pattern space. For a given protein sequence, PSI-BLAST is used to generate a profile that defines patterns for amino acid residues and their local sequence environments. By applying the PSI-BLAST to protein sequences with known secondary structures, we construct pattern databases. The secondary structure of a query residue of a protein with unknown structure can be determined by comparing the query pattern with those in the pattern databases and selecting the patterns close to the query pattern. We have tested the PREDICT on the CB513 set (a set of 513 non-homologous proteins) in three different ways. The first test was based on a pattern database derived from 7777 proteins in the Protein Data Bank (PDB), including those homologous to proteins in the CB513 set and gave an average Q(3) score of 78.8% per chain. In the second test, in order to carry out a more stringent benchmark test on the CB513 set, we removed from the 7777 proteins all proteins homologous to the CB513 set, leaving 4330 proteins. Pattern databases were constructed based on these proteins, and the average Q(3) score was 74.6%. In the third test, we selected one query protein among the CB513 set and built pattern databases by using the remaining 512 proteins. This procedure was repeated for each of the 513 proteins, and the average Q(3) score was 73.1%. Finally, we participated in the CASP5 (group ID: 531) where we employed the first-layer database based on the 7777 proteins and the second-layer database based on the CB513 set. The PREDICT gave quite promising results with an average Q(3) (Sov) score of 78.1 (77.4) % on 55 CASP5 targets.
引用
收藏
页码:1441 / 1449
页数:9
相关论文
共 50 条
  • [1] Sann: Solvent accessibility prediction of proteins by nearest neighbor method
    Joo, Keehyoung
    Lee, Sung Jong
    Lee, Jooyoung
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2012, 80 (07) : 1791 - 1797
  • [2] Prediction of Pareto Dominance Using Nearest Neighbor Method Based on Decision Space Transformation
    Li W.-B.
    He J.-J.
    Feng C.-Y.
    Guo G.-Q.
    Guo, Guan-Qi (gq.guo@163.com), 1600, Science Press (43): : 294 - 301
  • [3] PROTEIN SECONDARY STRUCTURE PREDICTION USING NEAREST-NEIGHBOR METHODS
    YI, TM
    LANDER, ES
    JOURNAL OF MOLECULAR BIOLOGY, 1993, 232 (04) : 1117 - 1129
  • [4] Protein β-turn prediction using nearest-neighbor method
    Kim, S
    BIOINFORMATICS, 2004, 20 (01) : 40 - 44
  • [5] Rockburst prediction method based on K-nearest neighbor pattern recognition
    Su Guoshao
    Lei Wenjie
    Zhang Xiaofei
    Progress in Mining Science and Safety Technology, Pts A and B, 2007, : 840 - 845
  • [6] A Nearest Neighbor Method for Predicting Solenoid Proteins
    Cheng, Wen
    Sanjaka, Malinda
    Yan, Changhui
    2012 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC 2012), 2012, : 68 - 71
  • [7] Space Efficient Data Structures for Nearest Larger Neighbor
    Jayapaul, Varunkumar
    Jo, Seungbum
    Raman, Venkatesh
    Satti, Srinivasa Rao
    COMBINATORIAL ALGORITHMS, IWOCA 2014, 2015, 8986 : 176 - 187
  • [8] Space efficient data structures for nearest larger neighbor
    Jayapaul, Varunkumar
    Jo, Seungbum
    Raman, Rajeev
    Raman, Venkatesh
    Satti, Srinivasa Rao
    JOURNAL OF DISCRETE ALGORITHMS, 2016, 36 : 63 - 75
  • [9] Prediction of Secondary Structures of Proteins Using a Two-Stage Method
    Turkay, Metin
    Yilmaz, Ozlem
    Yuksektepe, Fadime Uney
    16TH EUROPEAN SYMPOSIUM ON COMPUTER AIDED PROCESS ENGINEERING AND 9TH INTERNATIONAL SYMPOSIUM ON PROCESS SYSTEMS ENGINEERING, 2006, 21 : 1679 - 1685
  • [10] Prediction of secondary structures of proteins using a two-stage method
    Yueksektepe, Fadime Ueney
    Yilmaz, Oezlem
    Tuerkay, Metin
    COMPUTERS & CHEMICAL ENGINEERING, 2008, 32 (1-2) : 78 - 88