Protein pKa Prediction by Tree-Based Machine Learning

被引:18
|
作者
Chen, Ada Y. [1 ,2 ]
Lee, Juyong [3 ]
Damjanovic, Ana [4 ]
Brooks, Bernard R. [2 ]
机构
[1] Johns Hopkins Univ, Dept Phys & Astron, Baltimore, MD 21218 USA
[2] NHLBI, Lab Computat Biol, NIH, Bldg 10, Bethesda, MD 20892 USA
[3] Kangwon Natl Univ, Dept Chem, Div Chem & Biochem, Chunchon 24341, South Korea
[4] Johns Hopkins Univ, Dept Biophys, Baltimore, MD 21218 USA
基金
新加坡国家研究基金会; 美国国家卫生研究院;
关键词
PH MOLECULAR-DYNAMICS; POISSON-BOLTZMANN EQUATION; SMOOTH DIELECTRIC FUNCTION; CONSTANT-PH; EXPLICIT SOLVENT; HYDROPHOBIC INTERIOR; IONIZABLE RESIDUES; STRUCTURAL-CHANGES; PROTEIN PK(A); CONFORMATIONAL FLEXIBILITY;
D O I
10.1021/acs.jctc.1c01257
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Protonation states of ionizable protein residuesmodulate many essential biological processes. For correct modelingand understanding of these processes, it is crucial to accuratelydetermine their pKavalues. Here, we present four tree-basedmachine learning models for protein pKaprediction. The fourmodels, Random Forest, Extra Trees, eXtreme Gradient Boosting(XGBoost), and Light Gradient Boosting Machine (LightGBM),were trained on three experimental PDB and pKadatasets, two ofwhich included a notable portion of internal residues. We observedsimilar performance among the four machine learning algorithms.The best model trained on the largest dataset performs 37% betterthan the widely used empirical pKaprediction tool PROPKA and15% better than the published result from the pKapredictionmethod DelPhiPKa. The overall root-mean-square error (RMSE) for this model is 0.69, with surface and buried RMSE values being0.56 and 0.78, respectively, considering six residue types (Asp, Glu, His, Lys, Cys, and Tyr), and 0.63 when considering Asp, Glu,His, and Lys only. We provide pKapredictions for proteins in human proteome from the AlphaFold Protein Structure Database andobserved that 1% of Asp/Glu/Lys residues have highly shifted pKavalues close to the physiological pH.
引用
收藏
页码:2673 / 2686
页数:14
相关论文
共 50 条
  • [1] Protein pKa Prediction with Machine Learning
    Cai, Zhitao
    Luo, Fangfang
    Wang, Yongxian
    Li, Enling
    Huang, Yandong
    ACS OMEGA, 2021, 6 (50): : 34823 - 34831
  • [2] Flood susceptibility prediction using tree-based machine learning models in the GBA
    Lyu, Hai -Min
    Yin, Zhen-Yu
    SUSTAINABLE CITIES AND SOCIETY, 2023, 97
  • [3] Assessment of flood susceptibility prediction based on optimized tree-based machine learning models
    Eslaminezhad, Seyed Ahmad
    Eftekhari, Mobin
    Azma, Aliasghar
    Kiyanfar, Ramin
    Akbari, Mohammad
    JOURNAL OF WATER AND CLIMATE CHANGE, 2022, 13 (06) : 2353 - 2385
  • [4] Basis for Accurate Protein pKa Prediction with Machine Learning
    Cai, Zhitao
    Liu, Tengzi
    Lin, Qiaoling
    He, Jiahao
    Lei, Xiaowei
    Luo, Fangfang
    Huang, Yandong
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (10) : 2936 - 2947
  • [5] Tree-based machine learning models for prediction of bed elevation around bridge piers
    Rehman, Khawar
    Wang, Yung-Chieh
    Waseem, Muhammad
    Hong, Seung Ho
    PHYSICS OF FLUIDS, 2022, 34 (08)
  • [6] Evaluation of Tree-Based Machine Learning and Deep Learning Techniques in Temperature-Based Potential Evapotranspiration Prediction
    Akar, Funda
    Katipoglu, Okan Mert
    Yesilyurt, Sefa Nur
    Tas, Mehmet Bilge Han
    POLISH JOURNAL OF ENVIRONMENTAL STUDIES, 2023, 32 (02): : 1009 - 1023
  • [7] Tree-based interpretable machine learning of the thermodynamic phases
    Yang, Jintao
    Cao, Junpeng
    PHYSICS LETTERS A, 2021, 412
  • [8] Runtime Optimizations for Tree-based Machine Learning Models
    Asadi, Nima
    Lin, Jimmy
    de Vries, Arjen P.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (09) : 2281 - 2292
  • [9] Tree-based Machine Learning Methods for Survey Research
    Kern, Christoph
    Klausch, Thomas
    Kreuter, Frauke
    SURVEY RESEARCH METHODS, 2019, 13 (01): : 73 - 93
  • [10] Cosmic string detection with tree-based machine learning
    Sadr, A. Vafaei
    Farhang, M.
    Movahed, S. M. S.
    Bassett, B.
    Kunz, M.
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2018, 478 (01) : 1132 - 1140