Discrimination of Thermophilic and Mesophilic Proteins Using Support Vector Machine and Decision Tree

被引:2
|
作者
Ai, Haixin [1 ,2 ]
Zhang, Li [1 ]
Zhang, Jikuan [3 ]
Cui, Tong [4 ]
Chang, Alan K. [1 ]
Liu, Hongsheng [1 ,2 ]
机构
[1] Liaoning Univ, Sch Life Sci, Shenyang, Liaoning, Peoples R China
[2] Res Ctr Comp Simulating & Informat Proc Biomacrom, Shenyang, Liaoning, Peoples R China
[3] Liaoning Univ, Sch Informat, Shenyang, Liaoning, Peoples R China
[4] Liaoning Prov Shiyan High Sch, Shenyang 110841, Liaoning, Peoples R China
基金
中国国家自然科学基金;
关键词
Protein thermostability; support vector machine; decision tree; unbalanced data; dipeptide; amino acid; TARGET INTERACTION PREDICTION; THERMAL-STABILITY; T4; LYSOZYME; ENZYMES; THERMOSTABILITY; CLASSIFICATION; SEQUENCE;
D O I
10.2174/1570164615666180718143606
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The need to enhance the stability of proteins is vital to protein engineering and design. The manipulation of protein stability is also important to understand the principles that govern protein thermostability, both in basic research and industrial application. Objective: To build models that can discriminate thermophilic and mesophilic proteins and comprehend the factors influencing protein thermostability using machine learning methods. Method: A total of 613 protein features were calculated and various feature selection algorithms were used to build subset features. Support vector machine and decision tree methods were applied to predict the thermostability of the proteins, and the problems caused by unbalanced data were resolved by using a grid search method to find the best weights of error costs for different classes. Results: According to the result, the influence of primary structure on the thermo stability of a protein was more important than the influence of secondary structure. The best classification model was obtained when the support vector machine was run on the subset of amino acid composition plus amino acid class composition, which yielded a prediction accuracy of 84.07%. At the primary structure level, Gln, Glu, and Ser were the features that contributed most to protein thermostability. At the secondary structure level, Q_coil and Helix_E were the most important features affected protein thermostability. Conclusion: These results suggested that the thermostability of a protein was mainly associated with the primary structural features of the protein.
引用
收藏
页码:374 / 383
页数:10
相关论文
共 50 条
  • [21] Study on Support Vector Machine Based Decision Tree and Application
    Dong, G. M.
    Chen, J.
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 5, PROCEEDINGS, 2008, : 318 - 322
  • [22] Student Pass Rates Prediction Using Optimized Support Vector Machine and Decision Tree
    Ma, Xiaofeng
    Zhou, Zhurong
    [J]. 2018 IEEE 8TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2018, : 209 - 215
  • [23] Knee Osteoarthritis Classification Using Support Vector Machine AdaBoost and Decision Tree Adaboost
    Rustam, Z.
    Pandelaki, J.
    Kusuma, D. A.
    [J]. PROCEEDINGS OF THE 4TH INTERNATIONAL SYMPOSIUM ON CURRENT PROGRESS IN MATHEMATICS AND SCIENCES (ISCPMS2018), 2019, 2168
  • [24] Leakage Detection in Pipelines Using Decision Tree and Multi-Support Vector Machine
    Chen, Zhigang
    Xu, Xu
    Du, Xiaolei
    Zhang, Junling
    Yu, Miao
    [J]. PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON ELECTRICAL, CONTROL AND AUTOMATION ENGINEERING (ECAE 2017), 2017, 140 : 327 - 331
  • [25] Identifying the Mesophilic And Thermophilic Proteins From Their Amino Acid Composition With V-Support Vector Machines
    Ding, Yanrui
    Cai, Yujie
    Sun, Jun
    Xu, Wenbo
    [J]. DCABES 2008 PROCEEDINGS, VOLS I AND II, 2008, : 1222 - 1227
  • [26] Identifying the Mesophilic and Thermophilic Proteins from their Amino Acid Composition with nu-Support Vector Machines
    Ding, Y. R.
    Cai, Y. J.
    Sun, J.
    Xu, W. B.
    [J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2010, 4 (03) : 335 - 348
  • [27] Discrimination of Thermophilic and Mesophilic Proteins Using Reduced Amino Acid Alphabets with n-Grams
    Albayrak, Aydin
    Sezerman, Ugur O.
    [J]. CURRENT BIOINFORMATICS, 2012, 7 (02) : 152 - 158
  • [28] Discrimination of Outer Membrane Proteins using Reformulated Support Vector Machine based on Neutrosophic Set
    Ju, Wen
    Cheng, H. D.
    [J]. PROCEEDINGS OF THE 11TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2008,
  • [29] Support vector machine classification for large datasets using decision tree and Fisher linear discriminant
    Lopez Chau, Asdrubal
    Li, Xiaoou
    Yu, Wen
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2014, 36 : 57 - 65
  • [30] Partial Discharge Source Discrimination using a Support Vector Machine
    Hao, L.
    Lewin, P. L.
    [J]. IEEE TRANSACTIONS ON DIELECTRICS AND ELECTRICAL INSULATION, 2010, 17 (01) : 189 - 197