Sequence-Based Prediction of DNA-Binding Residues in Proteins with Conservation and Correlation Information

被引:52
|
作者
Ma, Xin [1 ,2 ]
Guo, Jing [1 ]
Liu, Hong-De [1 ]
Xie, Jian-Ming [1 ]
Sun, Xiao [1 ]
机构
[1] Southeast Univ, State Key Lab Bioelect, Sch Biol Sci & Med Engn, Nanjing, Jiangsu, Peoples R China
[2] Nanjing Audit Univ, Nanjing, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
DNA-binding residues; random forest; physicochemical property; evolutionary information; WEB SERVER; SITES; IDENTIFICATION; EVOLUTIONARY; PARAMETERS; DISCOVERY; TOOL;
D O I
10.1109/TCBB.2012.106
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The recognition of DNA-binding residues in proteins is critical to our understanding of the mechanisms of DNA-protein interactions, gene expression, and for guiding drug design. Therefore, a prediction method DNABR (DNA Binding Residues) is proposed for predicting DNA-binding residues in protein sequences using the random forest (RF) classifier with sequence-based features. Two types of novel sequence features are proposed in this study, which reflect the information about the conservation of physicochemical properties of the amino acids, and the correlation of amino acids between different sequence positions in terms of physicochemical properties. The first type of feature uses the evolutionary information combined with the conservation of physicochemical properties of the amino acids while the second reflects the dependency effect of amino acids with regards to polarity-charge and hydrophobic properties in the protein sequences. Those two features and an orthogonal binary vector which reflect the characteristics of 20 types of amino acids are used to build the DNABR, a model to predict DNA-binding residues in proteins. The DNABR model achieves a value of 0.6586 for Matthew's correlation coefficient (MCC) and 93.04 percent overall accuracy (ACC) with a 68.47 percent sensitivity (SE) and 98.16 percent specificity (SP), respectively. The comparisons with each feature demonstrate that these two novel features contribute most to the improvement in predictive ability. Furthermore, performance comparisons with other approaches clearly show that DNABR has an excellent prediction performance for detecting binding residues in putative DNA-binding protein. The DNABR web-server system is freely available at http://www.cbi.seu.edu.cn/DNABR/.
引用
收藏
页码:1766 / 1775
页数:10
相关论文
共 50 条
  • [41] Using hidden Markov models to predict DNA-binding proteins with sequence and structure information
    Hsu, Yi-Yu
    Chen, Wei-Jhih
    Chen, Shu-Hui
    Kao, Hung-Yu
    SOFT COMPUTING, 2014, 18 (12) : 2365 - 2376
  • [42] Identification of DNA-Binding Proteins by Multiple Kernel Support Vector Machine and Sequence Information
    Ding, Yijie
    Chen, Feng
    Guo, Xiaoyi
    Tang, Jijun
    Wu, Hongjie
    CURRENT PROTEOMICS, 2020, 17 (04) : 302 - 310
  • [43] Using hidden Markov models to predict DNA-binding proteins with sequence and structure information
    Yi-Yu Hsu
    Wei-Jhih Chen
    Shu-Hui Chen
    Hung-Yu Kao
    Soft Computing, 2014, 18 : 2365 - 2376
  • [44] SEQUENCE-SPECIFIC DNA-BINDING BY MYC PROTEINS
    KERKHOFF, E
    BISTER, K
    KLEMPNAUER, KH
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1991, 88 (10) : 4323 - 4327
  • [45] Prediction of protein-binding residues: dichotomy of sequence-based methods developed using structured complexes versus disordered proteins
    Zhang, Jian
    Ghadermarzi, Sina
    Kurgan, Lukasz
    BIOINFORMATICS, 2020, 36 (18) : 4729 - 4738
  • [46] Sequence-Based Prediction of microRNA-Binding Residues in Proteins Using Cost-Sensitive Laplacian Support Vector Machines
    Wu, Jian-Sheng
    Zhou, Zhi-Hua
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2013, 10 (03) : 752 - 759
  • [47] Objective sequence-based subfamily classifications of mouse homeodomains reflect their in vitro DNA-binding preferences
    Santos, Miguel A.
    Turinsky, Andrei L.
    Ong, Serene
    Tsai, Jennifer
    Berger, Michael F.
    Badis, Gwenael
    Talukder, Shaheynoor
    Gehrke, Andrew R.
    Bulyk, Martha L.
    Hughes, Timothy R.
    Wodak, Shoshana J.
    NUCLEIC ACIDS RESEARCH, 2010, 38 (22) : 7927 - 7942
  • [48] SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues
    Yang, Xiaoxia
    Wang, Jia
    Sun, Jun
    Liu, Rong
    PLOS ONE, 2015, 10 (07):
  • [49] Mining sequence features for DNA-binding site prediction
    Hu, Jing
    Yan, Changhui
    2008 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2008, : 69 - 72
  • [50] A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues
    Yan, Jing
    Friedrich, Stefanie
    Kurgan, Lukasz
    BRIEFINGS IN BIOINFORMATICS, 2016, 17 (01) : 88 - 105