Structural protein fold recognition based on secondary structure and evolutionary information using machine learning algorithms

被引:8
|
作者
Qin, Xinyi [1 ]
Liu, Min [1 ]
Zhang, Lu [1 ]
Liu, Guangzhong [1 ]
机构
[1] Shanghai Maritime Univ, Coll Informat Engn, Shanghai 201306, Peoples R China
关键词
Protein fold recognition; ASTRAL; Secondary structure; Evolutionary information; Feature selection algorithm; IFS; PREDICTION; CLASSIFICATION; DATABASE;
D O I
10.1016/j.compbiolchem.2021.107456
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Understanding the function of protein is conducive to research in advanced fields such as gene therapy of diseases, the development and design of new drugs, etc. The prerequisite for understanding the function of a protein is to determine its tertiary structure. The realization of protein structure classification is indispensable for this problem and fold recognition is a commonly used method of protein structure classification. Protein sequences of 40% identity in the ASTRAL protein classification database are used for fold recognition research in current work to predict 27 folding types which mostly belong to four protein structural classes: alpha, beta, alpha/beta and alpha + beta. We extract features from primary structure of protein using methods covering DSSP, PSSM and HMM which are based on secondary structure and evolutionary information to convert protein sequences into feature vectors that can be recognized by machine learning algorithm and utilize the combination of LightGBM feature selection algorithm and incremental feature selection method (IFS) to find the optimal classifiers respectively constructed by machine learning algorithms on the basis of tree structure including Random Forest, XGBoost and LightGBM. Bayesian optimization method is used for hyper-parameter adjustment of machine learning algorithms to make the accuracy of fold recognition reach as high as 93.45% at last. The result obtained by the model we propose is outstanding in the study of protein fold recognition.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Improving protein fold recognition using the amalgamation of evolutionary-based and structural based information
    Kuldip K Paliwal
    Alok Sharma
    James Lyons
    Abdollah Dehzangi
    [J]. BMC Bioinformatics, 15
  • [2] Improving protein fold recognition using the amalgamation of evolutionary-based and structural based information
    Paliwal, Kuldip K.
    Sharma, Alok
    Lyons, James
    Dehzangi, Abdollah
    [J]. BMC BIOINFORMATICS, 2014, 15
  • [3] PFRES: protein fold classification by using evolutionary information and predicted secondary structure
    Chen, Ke
    Kurgan, Lukasz
    [J]. BIOINFORMATICS, 2007, 23 (21) : 2843 - 2850
  • [4] Enhanced protein fold recognition using secondary structure information from NMR
    Ayers, DJ
    Gooley, PR
    Widmer-Cooper, A
    Torda, AE
    [J]. PROTEIN SCIENCE, 1999, 8 (05) : 1127 - 1133
  • [5] Fold recognition using sequence and secondary structure information
    Koretke, KK
    Russell, RB
    Copley, RR
    Lupas, AN
    [J]. PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1999, : 141 - 148
  • [6] A machine learning information retrieval approach to protein fold recognition
    Cheng, Jianlin
    Baldi, Pierre
    [J]. BIOINFORMATICS, 2006, 22 (12) : 1456 - 1463
  • [7] Protein Secondary Structure Prediction based on CNN and Machine Learning Algorithms
    Ema, Romana Rahman
    Adnan, Md Nasim
    Khatun, Mt Akhi
    Galib, Syed Md.
    Kabir, Sk Shalauddin
    Hossain, Md Alam
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (11) : 74 - 81
  • [8] Protein fold recognition using residue-based alignments of sequence and secondary structure
    Aydin, Zafer
    Erdogan, Hakan
    Altunbasak, Yucel
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS, 2007, : 349 - +
  • [9] A 9-state hidden Markov model using protein secondary structure information for protein fold recognition
    Lee, Sun Young
    Lee, Jong Yun
    Jung, Kwang Su
    Ryu, Keun Ho
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2009, 39 (06) : 527 - 534
  • [10] TertProt: A Protein Fold Recognition Method Using Protein Secondary Structure Program
    Kaladhar, D. S. V. G. K.
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 161 - 168