Discrimination of Thermophilic and Mesophilic Proteins Using Support Vector Machine and Decision Tree

被引:2
|
作者
Ai, Haixin [1 ,2 ]
Zhang, Li [1 ]
Zhang, Jikuan [3 ]
Cui, Tong [4 ]
Chang, Alan K. [1 ]
Liu, Hongsheng [1 ,2 ]
机构
[1] Liaoning Univ, Sch Life Sci, Shenyang, Liaoning, Peoples R China
[2] Res Ctr Comp Simulating & Informat Proc Biomacrom, Shenyang, Liaoning, Peoples R China
[3] Liaoning Univ, Sch Informat, Shenyang, Liaoning, Peoples R China
[4] Liaoning Prov Shiyan High Sch, Shenyang 110841, Liaoning, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein thermostability; support vector machine; decision tree; unbalanced data; dipeptide; amino acid; TARGET INTERACTION PREDICTION; THERMAL-STABILITY; T4; LYSOZYME; ENZYMES; THERMOSTABILITY; CLASSIFICATION; SEQUENCE;
D O I
10.2174/1570164615666180718143606
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The need to enhance the stability of proteins is vital to protein engineering and design. The manipulation of protein stability is also important to understand the principles that govern protein thermostability, both in basic research and industrial application. Objective: To build models that can discriminate thermophilic and mesophilic proteins and comprehend the factors influencing protein thermostability using machine learning methods. Method: A total of 613 protein features were calculated and various feature selection algorithms were used to build subset features. Support vector machine and decision tree methods were applied to predict the thermostability of the proteins, and the problems caused by unbalanced data were resolved by using a grid search method to find the best weights of error costs for different classes. Results: According to the result, the influence of primary structure on the thermo stability of a protein was more important than the influence of secondary structure. The best classification model was obtained when the support vector machine was run on the subset of amino acid composition plus amino acid class composition, which yielded a prediction accuracy of 84.07%. At the primary structure level, Gln, Glu, and Ser were the features that contributed most to protein thermostability. At the secondary structure level, Q_coil and Helix_E were the most important features affected protein thermostability. Conclusion: These results suggested that the thermostability of a protein was mainly associated with the primary structural features of the protein.
引用
收藏
页码:374 / 383
页数:10
相关论文
共 50 条
  • [1] Support vector machine for discrimination of thermophilic and mesophilic proteins based on amino acid composition
    Zhang, Guangya
    Fang, Baishan
    [J]. PROTEIN AND PEPTIDE LETTERS, 2006, 13 (10): : 965 - 970
  • [2] Discrimination of mesophilic and thermophilic proteins using machine learning algorithms
    Gromiha, M. Michael
    Suresh, M. Xavier
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2008, 70 (04) : 1274 - 1279
  • [3] Discrimination of Thermophilic and Mesophilic Proteins
    Taylor, Todd J.
    [J]. BIBMW: 2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOP, 2009, : 154 - 162
  • [4] Discrimination of thermophilic and mesophilic proteins
    Taylor, Todd J.
    Vaisman, Iosif I.
    [J]. BMC STRUCTURAL BIOLOGY, 2010, 10
  • [5] Discrimination and classification of thermophilic and mesophilic proteins
    Taylor, Todd J.
    Vaisman, Iosif I.
    [J]. ISVD 2007: THE 4TH INTERNATIONAL SYMPOSIUM ON VORONOI DIAGRAMS IN SCIENCE AND ENGINEERING 2007, PROCEEDINGS, 2007, : 212 - +
  • [6] Decision tree support vector machine
    Zhang, Li
    Zhou, Wei-Da
    Su, Tian-Tian
    Jiao, Li-Cheng
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2007, 16 (01) : 1 - 15
  • [7] Automatic classification using decision tree and support vector machine
    Han, Y
    Lee, C
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS, 2005, 3682 : 1325 - 1330
  • [8] A Review of Machine Learning Techniques using Decision Tree and Support Vector Machine
    Somvanshi, Madan
    Tambade, Shital
    Chavan, Pranjali
    Shinde, S. V.
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2016,
  • [9] Feature selection and classification using support vector machine and decision tree
    Durgalakshmi, B.
    Vijayakumar, V.
    [J]. COMPUTATIONAL INTELLIGENCE, 2020, 36 (04) : 1480 - 1492
  • [10] Power quality disturbance identification using decision tree and support vector machine
    Chen, Huafeng
    Zhang, Gexiang
    [J]. Dianwang Jishu/Power System Technology, 2013, 37 (05): : 1272 - 1278