Protein pKa Prediction by Tree-Based Machine Learning

被引:18
|
作者
Chen, Ada Y. [1 ,2 ]
Lee, Juyong [3 ]
Damjanovic, Ana [4 ]
Brooks, Bernard R. [2 ]
机构
[1] Johns Hopkins Univ, Dept Phys & Astron, Baltimore, MD 21218 USA
[2] NHLBI, Lab Computat Biol, NIH, Bldg 10, Bethesda, MD 20892 USA
[3] Kangwon Natl Univ, Dept Chem, Div Chem & Biochem, Chunchon 24341, South Korea
[4] Johns Hopkins Univ, Dept Biophys, Baltimore, MD 21218 USA
基金
新加坡国家研究基金会; 美国国家卫生研究院;
关键词
PH MOLECULAR-DYNAMICS; POISSON-BOLTZMANN EQUATION; SMOOTH DIELECTRIC FUNCTION; CONSTANT-PH; EXPLICIT SOLVENT; HYDROPHOBIC INTERIOR; IONIZABLE RESIDUES; STRUCTURAL-CHANGES; PROTEIN PK(A); CONFORMATIONAL FLEXIBILITY;
D O I
10.1021/acs.jctc.1c01257
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Protonation states of ionizable protein residuesmodulate many essential biological processes. For correct modelingand understanding of these processes, it is crucial to accuratelydetermine their pKavalues. Here, we present four tree-basedmachine learning models for protein pKaprediction. The fourmodels, Random Forest, Extra Trees, eXtreme Gradient Boosting(XGBoost), and Light Gradient Boosting Machine (LightGBM),were trained on three experimental PDB and pKadatasets, two ofwhich included a notable portion of internal residues. We observedsimilar performance among the four machine learning algorithms.The best model trained on the largest dataset performs 37% betterthan the widely used empirical pKaprediction tool PROPKA and15% better than the published result from the pKapredictionmethod DelPhiPKa. The overall root-mean-square error (RMSE) for this model is 0.69, with surface and buried RMSE values being0.56 and 0.78, respectively, considering six residue types (Asp, Glu, His, Lys, Cys, and Tyr), and 0.63 when considering Asp, Glu,His, and Lys only. We provide pKapredictions for proteins in human proteome from the AlphaFold Protein Structure Database andobserved that 1% of Asp/Glu/Lys residues have highly shifted pKavalues close to the physiological pH.
引用
收藏
页码:2673 / 2686
页数:14
相关论文
共 50 条
  • [41] Tree-based machine learning performed in-memory with memristive analog CAM
    Pedretti, Giacomo
    Graves, Catherine E.
    Serebryakov, Sergey
    Mao, Ruibin
    Sheng, Xia
    Foltin, Martin
    Li, Can
    Strachan, John Paul
    NATURE COMMUNICATIONS, 2021, 12 (01)
  • [42] Fundamental error in tree-based machine learning model selection for reservoir characterisation
    Daniel Asante Otchere
    Energy Geoscience, 2024, 5 (02) : 218 - 228
  • [43] Detection of financial fraud: comparisons of some tree-based machine learning approaches
    Kausik Sengupta
    Pradyot Kumar Das
    Journal of Data, Information and Management, 2023, 5 (1-2): : 23 - 37
  • [44] Boosting Insights in Insurance Tariff Plans with Tree-Based Machine Learning Methods
    Henckaerts, Roel
    Cote, Marie-Pier
    Antonio, Katrien
    Verbelen, Roel
    NORTH AMERICAN ACTUARIAL JOURNAL, 2021, 25 (02) : 255 - 285
  • [45] A Comparative Analysis of Tree-based Machine Learning Algorithms for Breast Cancer Detection
    A'la, Fiddin Yusfida
    Permanasari, Adhistya Erna
    Setiawan, Noor Akhmad
    PROCEEDINGS OF 2019 12TH INTERNATIONAL CONFERENCE ON INFORMATION & COMMUNICATION TECHNOLOGY AND SYSTEM (ICTS), 2019, : 55 - 59
  • [46] Tree-Based Machine Learning Techniques for Automated Human Sleep Stage Classification
    Arslan, Recep Sinan
    Ulutas, Hasan
    Koksal, Ahmet Sertol
    Bakir, Mehmet
    Ciftci, Bulent
    TRAITEMENT DU SIGNAL, 2023, 40 (04) : 1385 - 1400
  • [47] Tree-Based Transforms for Privileged Learning
    Moradi, Mehdi
    Syeda-Mahmood, Tanveer
    Hor, Soheil
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2016, 2016, 10019 : 188 - 195
  • [48] Regression tree-based active learning
    Ashna Jose
    João Paulo Almeida de Mendonça
    Emilie Devijver
    Noël Jakse
    Valérie Monbet
    Roberta Poloni
    Data Mining and Knowledge Discovery, 2024, 38 : 420 - 460
  • [49] Fundamental error in tree-based machine learning model selection for reservoir characterisation
    Otchere, Daniel Asante
    ENERGY GEOSCIENCE, 2024, 5 (02):
  • [50] On the Netlist Gate-level Pruning for Tree-based Machine Learning Accelerators
    de Abreu, Brunno A.
    Paim, Guilherme
    Castro-Godinez, Jorge
    Grellert, Mateus
    Bampi, Sergio
    2022 IEEE 13TH LATIN AMERICAN SYMPOSIUM ON CIRCUITS AND SYSTEMS (LASCAS), 2022, : 21 - 24