Improved DNA-Binding Protein Identification by Incorporating Evolutionary Information Into the Chou's PseAAC

被引:30
|
作者
Fu, Xiangzheng [1 ]
Zhu, Wen [1 ,2 ]
Liao, Bo [1 ,2 ]
Cai, Lijun [1 ]
Peng, Lihong [3 ]
Yang, Jialiang [2 ,4 ]
机构
[1] Hunan Univ, Coll Informat Sci & Engn, Changsha 410082, Hunan, Peoples R China
[2] Hainan Normal Univ, Sch Math & Stat, Haikou 570100, Peoples R China
[3] Hunan Univ Technol, Sch Comp Sci, Zhuzhou 412007, Peoples R China
[4] Icahn Sch Med Mt Sinai, Icahn Inst Genom & Multiscale Biol, New York, NY 10029 USA
来源
IEEE ACCESS | 2018年 / 6卷
关键词
DNA-binding protein identification; feature representation algorithm; evolutionary information; support vector machine; AMINO-ACID-COMPOSITION; PREDICT SUBCELLULAR-LOCALIZATION; ENSEMBLE CLASSIFIER; WEB SERVER; SEQUENCE; SITES; RNA; BIOINFORMATICS; GENERATION; PROMOTERS;
D O I
10.1109/ACCESS.2018.2876656
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DNA-binding proteins play critical roles in various cellular biological processes, such as gene expression and transcription. However, the experimental methods to identify these proteins like ChIP-sequencing are expensive and time-consuming, which presents the need for in silico methods, especially machine learning-based methods. In recent years, the accuracy of machine learning-based DNA-binding protein prediction has been increasing significantly. However, there are still some critical problems to be solved like how to convert protein sequences into an appropriate discrete model or vector. In this paper, we propose a novel feature construction method based on a position-specific scoring matrix (PSSM) named K-PSSM-Composition. The proposed features can efficiently capture the information about 20 amino acid residues and the local information of a given sequence during the evolutionary process. We perform a recursive feature elimination to extract the optimal set of features, which are used to train the support vector machine model for predicting DNA-binding proteins. We evaluate and compare our proposed predictor with other advanced predictors via two standard benchmark data sets. The proposed method achieves the accuracy values of 89.77% and 88.71% for the jackknife test and independent test respectively, outperforming the compared methods. This finding demonstrates the efficacy and effectiveness of the proposed method in predicting the DNA-binding proteins.
引用
收藏
页码:66545 / 66556
页数:12
相关论文
共 50 条
  • [1] DPP-PseAAC: A DNA-binding protein prediction model using Chou's general PseAAC
    Rahman, M. Saifur
    Shatabda, Swakkhar
    Saha, Sanjay
    Kaykobad, M.
    Rahman, M. Sohel
    JOURNAL OF THEORETICAL BIOLOGY, 2018, 452 : 22 - 34
  • [2] PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation
    Liu, Bin
    Xu, Jinghao
    Fan, Shixi
    Xu, Ruifeng
    Zhou, Jiyun
    Wang, Xiaolong
    MOLECULAR INFORMATICS, 2015, 34 (01) : 8 - 17
  • [3] Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou's general PseAAC
    Shen, Yinan
    Tang, Jijun
    Guo, Fei
    JOURNAL OF THEORETICAL BIOLOGY, 2019, 462 : 230 - 239
  • [4] Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation
    Li, Chun
    Zhao, Jialing
    Wang, Changzhong
    Yao, Yuhua
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2018, 21 (02) : 100 - 110
  • [5] Use Chou's 5-Step Rule to Predict DNA-Binding Proteins with Evolutionary Information
    Lu, Weizhong
    Song, Zhengwei
    Ding, Yijie
    Wu, Hongjie
    Cao, Yan
    Zhang, Yu
    Li, Haiou
    BIOMED RESEARCH INTERNATIONAL, 2020, 2020
  • [6] Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou's general PseAAC
    Ahmad, Saeed
    Kabir, Muhammad
    Hayat, Maqsood
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2015, 122 (02) : 165 - 174
  • [7] Effective DNA binding protein prediction by using key features via Chou's general PseAAC
    Adilina, Sheikh
    Farid, Dewan Md
    Shatabda, Swakkhar
    JOURNAL OF THEORETICAL BIOLOGY, 2019, 460 : 64 - 78
  • [8] Predicting membrane protein types by incorporating a novel feature set into Chou's general PseAAC
    Sankari, E. Siva
    Manimegalai, D.
    JOURNAL OF THEORETICAL BIOLOGY, 2018, 455 : 319 - 328
  • [9] Incorporating Secondary Features into the General form of Chou's PseAAC for Predicting Protein Structural Class
    Liao, Bo
    Xiang, Qilin
    Li, Dachao
    PROTEIN AND PEPTIDE LETTERS, 2012, 19 (11): : 1133 - 1138
  • [10] Accurate prediction of protein structural classes by incorporating PSSS and PSSM into Chou's general PseAAC
    Zhang, Shengli
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2015, 142 : 28 - 35