Sequence-Based Prediction of DNA-Binding Residues in Proteins with Conservation and Correlation Information

被引:52
|
作者
Ma, Xin [1 ,2 ]
Guo, Jing [1 ]
Liu, Hong-De [1 ]
Xie, Jian-Ming [1 ]
Sun, Xiao [1 ]
机构
[1] Southeast Univ, State Key Lab Bioelect, Sch Biol Sci & Med Engn, Nanjing, Jiangsu, Peoples R China
[2] Nanjing Audit Univ, Nanjing, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
DNA-binding residues; random forest; physicochemical property; evolutionary information; WEB SERVER; SITES; IDENTIFICATION; EVOLUTIONARY; PARAMETERS; DISCOVERY; TOOL;
D O I
10.1109/TCBB.2012.106
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The recognition of DNA-binding residues in proteins is critical to our understanding of the mechanisms of DNA-protein interactions, gene expression, and for guiding drug design. Therefore, a prediction method DNABR (DNA Binding Residues) is proposed for predicting DNA-binding residues in protein sequences using the random forest (RF) classifier with sequence-based features. Two types of novel sequence features are proposed in this study, which reflect the information about the conservation of physicochemical properties of the amino acids, and the correlation of amino acids between different sequence positions in terms of physicochemical properties. The first type of feature uses the evolutionary information combined with the conservation of physicochemical properties of the amino acids while the second reflects the dependency effect of amino acids with regards to polarity-charge and hydrophobic properties in the protein sequences. Those two features and an orthogonal binary vector which reflect the characteristics of 20 types of amino acids are used to build the DNABR, a model to predict DNA-binding residues in proteins. The DNABR model achieves a value of 0.6586 for Matthew's correlation coefficient (MCC) and 93.04 percent overall accuracy (ACC) with a 68.47 percent sensitivity (SE) and 98.16 percent specificity (SP), respectively. The comparisons with each feature demonstrate that these two novel features contribute most to the improvement in predictive ability. Furthermore, performance comparisons with other approaches clearly show that DNABR has an excellent prediction performance for detecting binding residues in putative DNA-binding protein. The DNABR web-server system is freely available at http://www.cbi.seu.edu.cn/DNABR/.
引用
收藏
页码:1766 / 1775
页数:10
相关论文
共 50 条
  • [21] Accurate sequence-based prediction of catalytic residues
    Zhang, Tuo
    Zhang, Hua
    Chen, Ke
    Shen, Shiyi
    Ruan, Jishou
    Kurgan, Lukasz
    BIOINFORMATICS, 2008, 24 (20) : 2329 - 2338
  • [22] Sequence-based Detection of DNA-binding Proteins using Multiple-View Features Allied with Feature Selection
    Zhou, Liling
    Song, Xiaoning
    Yu, Dong-Jun
    Sun, Jun
    MOLECULAR INFORMATICS, 2020, 39 (08)
  • [23] A Novel Sequence-Based Feature for the Identification of DNA-Binding Sites in Proteins Using Jensen-Shannon Divergence
    Dang, Truong Khanh Linh
    Meckbach, Cornelia
    Tacke, Rebecca
    Waack, Stephan
    Gueltas, Mehmet
    ENTROPY, 2016, 18 (10)
  • [24] TargetDBP: Accurate DNA-Binding Protein Prediction Via Sequence-Based Multi-View Feature Learning
    Hu, Jun
    Zhou, Xiao-Gen
    Zhu, Yi-Heng
    Yu, Dong-Jun
    Zhang, Gui-Jun
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (04) : 1419 - 1429
  • [25] Accurate Sequence-Based Prediction of Deleterious nsSNPs with Multiple Sequence Profiles and Putative Binding Residues
    Song, Ruiyang
    Cao, Baixin
    Peng, Zhenling
    Oldfield, Christopher J.
    Kurgan, Lukasz
    Wong, Ka-Chun
    Yang, Jianyi
    BIOMOLECULES, 2021, 11 (09)
  • [26] Predicting DNA-Binding Proteins and Binding Residues by Complex Structure Prediction and Application to Human Proteome
    Zhao, Huiying
    Wang, Jihua
    Zhou, Yaoqi
    Yang, Yuedong
    PLOS ONE, 2014, 9 (05):
  • [27] Identification of DNA-Binding Proteins Using Support Vector Machine with Sequence Information
    Ma, Xin
    Wu, Jiansheng
    Xue, Xiaoyun
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2013, 2013
  • [28] Evolutionary conservation of DNA-contact residues in DNA-binding domains
    Chang, Yao-Lin
    Tsai, Huai-Kuang
    Kao, Cheng-Yan
    Chen, Yung-Chian
    Hu, Yuh-Jyh
    Yang, Jinn-Moon
    BMC BIOINFORMATICS, 2008, 9 (Suppl 6)
  • [29] Evolutionary conservation of DNA-contact residues in DNA-binding domains
    Yao-Lin Chang
    Huai-Kuang Tsai
    Cheng-Yan Kao
    Yung-Chian Chen
    Yuh-Jyh Hu
    Jinn-Moon Yang
    BMC Bioinformatics, 9
  • [30] Sequence-based feature prediction and annotation of proteins
    Agnieszka S Juncker
    Lars J Jensen
    Andrea Pierleoni
    Andreas Bernsel
    Michael L Tress
    Peer Bork
    Gunnar von Heijne
    Alfonso Valencia
    Christos A Ouzounis
    Rita Casadio
    Søren Brunak
    Genome Biology, 10