Extreme Gradient Boosting Combined with Conformal Predictors for Informative Solubility Estimation

被引:0
|
作者
Jovic, Ozren [1 ]
Mouras, Rabah [1 ]
机构
[1] Univ Limerick, Bernal Inst, Pharmaceut Mfg Technol Ctr, Dept Chem Sci, Limerick V94 T9PX, Ireland
来源
MOLECULES | 2024年 / 29卷 / 01期
关键词
solubility; machine learning; extreme gradient boosting; variable selection; conformal predictor; prediction interval; applicability domain; molecular descriptor; AQUEOUS SOLUBILITY; APPLICABILITY DOMAIN; ORGANIC-COMPOUNDS; TRAINING SET; MODELS; SOLVENTS; DRUGS;
D O I
10.3390/molecules29010019
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We used the extreme gradient boosting (XGB) algorithm to predict the experimental solubility of chemical compounds in water and organic solvents and to select significant molecular descriptors. The accuracy of prediction of our forward stepwise top-importance XGB (FSTI-XGB) on curated solubility data sets in terms of RMSE was found to be 0.59-0.76 Log(S) for two water data sets, while for organic solvent data sets it was 0.69-0.79 Log(S) for the Methanol data set, 0.65-0.79 for the Ethanol data set, and 0.62-0.70 Log(S) for the Acetone data set. That was the first step. In the second step, we used uncurated and curated AquaSolDB data sets for applicability domain (AD) tests of Drugbank, PubChem, and COCONUT databases and determined that more than 95% of studied ca. 500,000 compounds were within the AD. In the third step, we applied conformal prediction to obtain narrow prediction intervals and we successfully validated them using test sets' true solubility values. With prediction intervals obtained in the last fourth step, we were able to estimate individual error margins and the accuracy class of the solubility prediction for molecules within the AD of three public databases. All that was possible without the knowledge of experimental database solubilities. We find these four steps novel because usually, solubility-related works only study the first step or the first two steps.
引用
下载
收藏
页数:28
相关论文
共 50 条
  • [31] Software Effort Estimation Based on Ensemble Extreme Gradient Boosting Algorithm and Modified Jaya Optimization Algorithm
    Kumar, Beesetti Kiran
    Bilgaiyan, Saurabh
    Mishra, Bhabani Shankar Prasad
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2024, 23 (01)
  • [32] Robust Head Pose Estimation Using Extreme Gradient Boosting Machine on Stacked Autoencoders Neural Network
    Minh Thanh Vo
    Trang Nguyen
    Tuong Le
    IEEE ACCESS, 2020, 8 : 3687 - 3694
  • [33] State of health estimation method for lithium⁃ion battery based on curve compression and extreme gradient boosting
    Liu X.-T.
    Liu X.-J.
    Wu J.
    He Y.
    Liu X.-T.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2022, 52 (06): : 1273 - 1280
  • [34] Gradient tree boosting and the estimation of production frontiers
    Guillen, Maria D.
    Aparicio, Juan
    Esteve, Miriam
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 214
  • [35] Mortality predictors in patients with COVID-19 pneumonia: a machine learning approach using eXtreme Gradient Boosting model
    N. Casillas
    A. M. Torres
    M. Moret
    A. Gómez
    J. M. Rius-Peris
    J. Mateo
    Internal and Emergency Medicine, 2022, 17 : 1929 - 1939
  • [36] Mortality predictors in patients with COVID-19 pneumonia: a machine learning approach using eXtreme Gradient Boosting model
    Casillas, N.
    Torres, A. M.
    Moret, M.
    Gomez, A.
    Rius-Peris, J. M.
    Mateo, J.
    INTERNAL AND EMERGENCY MEDICINE, 2022, 17 (07) : 1929 - 1939
  • [37] Anticipating bank distress in the Eurozone: An Extreme Gradient Boosting approach
    Climent, Francisco
    Momparler, Alexandre
    Carmona, Pedro
    JOURNAL OF BUSINESS RESEARCH, 2019, 101 : 885 - 896
  • [38] Nuclear charge radius predictions based on eXtreme Gradient Boosting
    Li, Weifeng
    Zhang, Xiaoyan
    Fang, Jiyu
    PHYSICA SCRIPTA, 2024, 99 (04)
  • [39] Forecasting inflation rates be extreme gradient boosting with the genetic algorithm
    Li Y.-S.
    Pai P.-F.
    Lin Y.-L.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (03) : 2211 - 2220
  • [40] Predicting energy use in construction using Extreme Gradient Boosting
    Han, Jiaming
    Shu, Kunxin
    Wang, Zhenyu
    PEERJ COMPUTER SCIENCE, 2023, 9