Protein pKa Prediction by Tree-Based Machine Learning

被引:18
|
作者
Chen, Ada Y. [1 ,2 ]
Lee, Juyong [3 ]
Damjanovic, Ana [4 ]
Brooks, Bernard R. [2 ]
机构
[1] Johns Hopkins Univ, Dept Phys & Astron, Baltimore, MD 21218 USA
[2] NHLBI, Lab Computat Biol, NIH, Bldg 10, Bethesda, MD 20892 USA
[3] Kangwon Natl Univ, Dept Chem, Div Chem & Biochem, Chunchon 24341, South Korea
[4] Johns Hopkins Univ, Dept Biophys, Baltimore, MD 21218 USA
基金
新加坡国家研究基金会; 美国国家卫生研究院;
关键词
PH MOLECULAR-DYNAMICS; POISSON-BOLTZMANN EQUATION; SMOOTH DIELECTRIC FUNCTION; CONSTANT-PH; EXPLICIT SOLVENT; HYDROPHOBIC INTERIOR; IONIZABLE RESIDUES; STRUCTURAL-CHANGES; PROTEIN PK(A); CONFORMATIONAL FLEXIBILITY;
D O I
10.1021/acs.jctc.1c01257
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Protonation states of ionizable protein residuesmodulate many essential biological processes. For correct modelingand understanding of these processes, it is crucial to accuratelydetermine their pKavalues. Here, we present four tree-basedmachine learning models for protein pKaprediction. The fourmodels, Random Forest, Extra Trees, eXtreme Gradient Boosting(XGBoost), and Light Gradient Boosting Machine (LightGBM),were trained on three experimental PDB and pKadatasets, two ofwhich included a notable portion of internal residues. We observedsimilar performance among the four machine learning algorithms.The best model trained on the largest dataset performs 37% betterthan the widely used empirical pKaprediction tool PROPKA and15% better than the published result from the pKapredictionmethod DelPhiPKa. The overall root-mean-square error (RMSE) for this model is 0.69, with surface and buried RMSE values being0.56 and 0.78, respectively, considering six residue types (Asp, Glu, His, Lys, Cys, and Tyr), and 0.63 when considering Asp, Glu,His, and Lys only. We provide pKapredictions for proteins in human proteome from the AlphaFold Protein Structure Database andobserved that 1% of Asp/Glu/Lys residues have highly shifted pKavalues close to the physiological pH.
引用
收藏
页码:2673 / 2686
页数:14
相关论文
共 50 条
  • [21] A Decision Tree-Based Method for Protein Contact Map Prediction
    Santiesteban Toca, Cosme Ernesto
    Marquez Chamorro, Alfonso E.
    Asencio Cortes, Gualberto
    Aguilar-Ruiz, Jesus S.
    EVOLUTIONARY COMPUTATION, MACHINE LEARNING AND DATA MINING IN BIOINFORMATICS, 2011, 6623 : 153 - 158
  • [22] Short-Term Visibility Prediction Using Tree-Based Machine Learning Algorithms and Numerical Weather Prediction Data
    Kim, Bu-Yo
    Belorid, Miloslav
    Cha, Joo Wan
    WEATHER AND FORECASTING, 2022, 37 (12) : 2263 - 2274
  • [23] Interpreting the prediction results of the tree-based gradient boosting models for financial distress prediction with an explainable machine learning approach
    Liu, Jiaming
    Li, Chengzhang
    Ouyang, Peng
    Liu, Jiajia
    Wu, Chong
    JOURNAL OF FORECASTING, 2023, 42 (05) : 1112 - 1137
  • [24] Hybrid decision tree-based machine learning models for short-term water quality prediction
    Lu, Hongfang
    Ma, Xin
    CHEMOSPHERE, 2020, 249
  • [25] Tree-Based and Machine Learning Algorithm Analysis for Breast Cancer Classification
    Bhardwaj, Arpit
    Bhardwaj, Harshit
    Sakalle, Aditi
    Uddin, Ziya
    Sakalle, Maneesha
    Ibrahim, Wubshet
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [26] Uncovering Sociological Effect Heterogeneity Using Tree-Based Machine Learning
    Brand, Jennie E.
    Xu, Jiahui
    Koch, Bernard
    Geraldo, Pablo
    SOCIOLOGICAL METHODOLOGY, VOL 51, ISSUE 2, 2021, 51 (02): : 189 - 223
  • [27] The predictability of tree-based machine learning algorithms in the big data context
    Qolipour F.
    Ghasemzadeh M.
    Mohammad-Karimi N.
    International Journal of Engineering, Transactions A: Basics, 2021, 34 (01): : 82 - 89
  • [28] A general tree-based machine learning accelerator with memristive analog CAM
    Pedretti, Giacomo
    Serebryakov, Sergey
    Strachan, John Paul
    Graves, Catherine E.
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 220 - 224
  • [29] Determining the Happiness Class of Countries with Tree-Based Algorithms in Machine Learning
    Dogruel, Merve
    Kara, Selin Soner
    ACTA INFOLOGICA, 2023, 7 (02): : 243 - 252
  • [30] Land subsidence modelling using tree-based machine learning algorithms
    Rahmati, Omid
    Falah, Fatemeh
    Naghibi, Seyed Amir
    Biggs, Trent
    Soltani, Milad
    Deo, Ravinesh C.
    Cerda, Artemi
    Mohammadi, Farnoush
    Dieu Tien Bui
    SCIENCE OF THE TOTAL ENVIRONMENT, 2019, 672 : 239 - 252