Classification of conformational stability of protein mutants from 2D graph representation of protein sequences using support vector machines

被引:5
|
作者
Fernandez, M. [1 ]
Caballero, J.
Fernandez, L.
Abreu, J. I.
Acostas, G.
机构
[1] Univ Matanzas, Ctr Biotechnol Studies, Fac Agron, Mol Modelling Grp, Matanzas 44740, Cuba
[2] Univ Talca, Ctr Bioinformat & Simulac Mol, Talca, Chile
[3] Univ Matanzas, Fac Informat, Artificial Intelligence Lab, Matanzas 44740, Cuba
[4] Natl Bioinformat Ctr, Havana 10200, Cuba
关键词
protein stability prediction; point mutations; kernel-based methods; graph similarity;
D O I
10.1080/08927020701377070
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Euclidean distance counts derived from the protein 2D graphs were used for encoding protein structural information. A total of 35 amino acid 2D distance count (AA2DC) descriptors were calculated from the Euclidean distance matrices (EDM) derived from the 2D graphs at distances ranging from 0.05 to 1.8 units with a lag of 0.05 units. AA2DC descriptors were tested for building predictive classification model of the signs of the change of thermal unfolding Gibbs free energy change (Delta Delta G) of a large data set of 2048 single point mutations on 64 proteins. A support vector machine (SVM) classifier with a Radial Basis Function kernel was implemented for classifying the conformational stability of protein mutants. Temperature and pH of the Delta Delta G experimental measurements were also conveniently used for SVM training in addition to calculated AA2DC descriptors. The optimum SVM model correctly predicted about 72% of Delta Delta G signs in crossvalidation test for all the dataset and also for stable and unstable mutant separately. To the best of our knowledge, this level of accuracy for stable mutant recognition is the highest ever reported for a predictor using sequence information. Furthermore, the classifier adequately recognized unstable mutants of human prion protein and human transthyretin associated to diseases.
引用
收藏
页码:889 / 896
页数:8
相关论文
共 50 条
  • [41] H-L curve: A Novel 2D Graphical Representation of Protein Sequences
    Li, Yongfan
    Huang, Guohua
    Liao, Bo
    Liu, Zanbo
    [J]. MATCH-COMMUNICATIONS IN MATHEMATICAL AND IN COMPUTER CHEMISTRY, 2009, 61 (02) : 519 - 532
  • [42] UC-Curve: A Highly Compact 2D Graphical Representation of Protein Sequences
    Li, Yushuang
    Liu, Qian
    Zheng, Xiaoqi
    He, Ping-an
    [J]. INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY, 2014, 114 (06) : 409 - 415
  • [43] Similarity/Dissimilarity Studies of Protein Sequences Based on a New 2D Graphical Representation
    Yao, Yu-Hua
    Dai, Qi
    Li, Ling
    Nan, Xu-Ying
    He, Ping-An
    Zhang, Yao-Zhou
    [J]. JOURNAL OF COMPUTATIONAL CHEMISTRY, 2010, 31 (05) : 1045 - 1052
  • [44] Proteometric modelling of protein conformational stability using amino acid sequence autocorrelation vectors and genetic algorithm-optimised support vector machines
    Fernandez, Michael
    Fernandez, Leyden
    Sanchez, Pedro
    Caballero, Julio
    Abreu, Jose Ignacio
    [J]. MOLECULAR SIMULATION, 2008, 34 (09) : 857 - 872
  • [45] Classification of adulterated milk with the parameterization of 2D correlation spectroscopy and least squares support vector machines
    Yang, Renjie
    Liu, Rong
    Xu, Kexin
    Yang, Yanrong
    Dong, Guimei
    Zhang, Weiyu
    [J]. ANALYTICAL METHODS, 2013, 5 (21) : 5949 - 5953
  • [46] Effect of Protein Repetitiveness on Protein-Protein Interaction Prediction Results Using Support Vector Machines
    Zhou, Jie
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2017, 24 (02) : 183 - 192
  • [47] Detecting Succinylation sites from protein sequences using ensemble support vector machine
    Ning, Qiao
    Zhao, Xiaosa
    Bao, Lingling
    Ma, Zhiqiang
    Zhao, Xiaowei
    [J]. BMC BIOINFORMATICS, 2018, 19
  • [48] Protein Secondary Structure Prediction Using Support Vector Machines (SVMs)
    Patel, Mayuri
    Shah, Hitesh
    [J]. 2013 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND RESEARCH ADVANCEMENT (ICMIRA 2013), 2013, : 594 - 598
  • [49] Predicting Protein Subcellular Localization using PsePSSM and Support Vector Machines
    Juan, Eric Y. T.
    Jhang, J. H.
    Li, W. J.
    [J]. PROCEEDINGS OF THE 11TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2008,
  • [50] Protein fold recognition using neural networks and support vector machines
    Jiang, N
    Wu, WXY
    Mitchell, I
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2005, PROCEEDINGS, 2005, 3578 : 462 - 469