Protein pKa Prediction by Tree-Based Machine Learning

被引:18
|
作者
Chen, Ada Y. [1 ,2 ]
Lee, Juyong [3 ]
Damjanovic, Ana [4 ]
Brooks, Bernard R. [2 ]
机构
[1] Johns Hopkins Univ, Dept Phys & Astron, Baltimore, MD 21218 USA
[2] NHLBI, Lab Computat Biol, NIH, Bldg 10, Bethesda, MD 20892 USA
[3] Kangwon Natl Univ, Dept Chem, Div Chem & Biochem, Chunchon 24341, South Korea
[4] Johns Hopkins Univ, Dept Biophys, Baltimore, MD 21218 USA
基金
新加坡国家研究基金会; 美国国家卫生研究院;
关键词
PH MOLECULAR-DYNAMICS; POISSON-BOLTZMANN EQUATION; SMOOTH DIELECTRIC FUNCTION; CONSTANT-PH; EXPLICIT SOLVENT; HYDROPHOBIC INTERIOR; IONIZABLE RESIDUES; STRUCTURAL-CHANGES; PROTEIN PK(A); CONFORMATIONAL FLEXIBILITY;
D O I
10.1021/acs.jctc.1c01257
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Protonation states of ionizable protein residuesmodulate many essential biological processes. For correct modelingand understanding of these processes, it is crucial to accuratelydetermine their pKavalues. Here, we present four tree-basedmachine learning models for protein pKaprediction. The fourmodels, Random Forest, Extra Trees, eXtreme Gradient Boosting(XGBoost), and Light Gradient Boosting Machine (LightGBM),were trained on three experimental PDB and pKadatasets, two ofwhich included a notable portion of internal residues. We observedsimilar performance among the four machine learning algorithms.The best model trained on the largest dataset performs 37% betterthan the widely used empirical pKaprediction tool PROPKA and15% better than the published result from the pKapredictionmethod DelPhiPKa. The overall root-mean-square error (RMSE) for this model is 0.69, with surface and buried RMSE values being0.56 and 0.78, respectively, considering six residue types (Asp, Glu, His, Lys, Cys, and Tyr), and 0.63 when considering Asp, Glu,His, and Lys only. We provide pKapredictions for proteins in human proteome from the AlphaFold Protein Structure Database andobserved that 1% of Asp/Glu/Lys residues have highly shifted pKavalues close to the physiological pH.
引用
收藏
页码:2673 / 2686
页数:14
相关论文
共 50 条
  • [31] Discussion on the tree-based machine learning model in the study of landslide susceptibility
    Liu, Qiang
    Tang, Aiping
    Huang, Ziyuan
    Sun, Lixin
    Han, Xiaosheng
    NATURAL HAZARDS, 2022, 113 (02) : 887 - 911
  • [32] Discussion on the tree-based machine learning model in the study of landslide susceptibility
    Qiang Liu
    Aiping Tang
    Ziyuan Huang
    Lixin Sun
    Xiaosheng Han
    Natural Hazards, 2022, 113 : 887 - 911
  • [33] Faster Convergence with Lexicase Selection in Tree-Based Automated Machine Learning
    Matsumoto, Nicholas
    Saini, Anil Kumar
    Ribeiro, Pedro
    Choi, Hyunjun
    Orlenko, Alena
    Lyytikainen, Leo-Pekka
    Laurikka, Jari O.
    Lehtimaki, Terho
    Batista, Sandra
    Moore, Jason H.
    GENETIC PROGRAMMING, EUROGP 2023, 2023, 13986 : 165 - 181
  • [34] A tree-based machine learning methodology to automatically classify software vulnerabilities
    Aivatoglou, Georgios
    Anastasiadis, Mike
    Spanos, Georgios
    Voulgaridis, Antonis
    Votis, Konstantinos
    Tzovaras, Dimitrios
    PROCEEDINGS OF THE 2021 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE (IEEE CSR), 2021, : 312 - 317
  • [35] The Predictability of Tree-based Machine Learning Algorithms in the Big Data Context
    Qolipour, F.
    Ghasemzadeh, M.
    Mohammad-Karimi, N.
    INTERNATIONAL JOURNAL OF ENGINEERING, 2021, 34 (01): : 82 - 89
  • [36] Malware Detection Method using Tree-based Machine Learning Algorithms
    Okada, Satoshi
    Matsuda, Wataru
    Fujimoto, Mariko
    Mitsunaga, Takuho
    2021 IEEE INTERNATIONAL CONFERENCE ON COMPUTING (ICOCO), 2021, : 103 - 108
  • [37] Protein pKa predictions with machine learning
    Shen, Mingzhe
    Liu, Ruibin
    Shen, Jana
    BIOPHYSICAL JOURNAL, 2024, 123 (03) : 549A - 549A
  • [38] Prediction of protein pKa with representation learning
    Gokcan, Hatice
    Isayev, Olexandr
    CHEMICAL SCIENCE, 2022, 13 (08) : 2462 - 2474
  • [39] MACHINE LEARNING TO JUDGE LABOR RELATIONS' HARMONIOUSNESS BASED ON DECISION TREE-BASED METHOD
    Chen, Tianxue
    Yang, Heqing
    3RD INTERNATIONAL SYMPOSIUM ON INFORMATION ENGINEERING AND ELECTRONIC COMMERCE (IEEC 2011), PROCEEDINGS, 2011, : 243 - 246
  • [40] Tree-based machine learning performed in-memory with memristive analog CAM
    Giacomo Pedretti
    Catherine E. Graves
    Sergey Serebryakov
    Ruibin Mao
    Xia Sheng
    Martin Foltin
    Can Li
    John Paul Strachan
    Nature Communications, 12