Extreme Gradient Boosting Combined with Conformal Predictors for Informative Solubility Estimation

被引:0
|
作者
Jovic, Ozren [1 ]
Mouras, Rabah [1 ]
机构
[1] Univ Limerick, Bernal Inst, Pharmaceut Mfg Technol Ctr, Dept Chem Sci, Limerick V94 T9PX, Ireland
来源
MOLECULES | 2024年 / 29卷 / 01期
关键词
solubility; machine learning; extreme gradient boosting; variable selection; conformal predictor; prediction interval; applicability domain; molecular descriptor; AQUEOUS SOLUBILITY; APPLICABILITY DOMAIN; ORGANIC-COMPOUNDS; TRAINING SET; MODELS; SOLVENTS; DRUGS;
D O I
10.3390/molecules29010019
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We used the extreme gradient boosting (XGB) algorithm to predict the experimental solubility of chemical compounds in water and organic solvents and to select significant molecular descriptors. The accuracy of prediction of our forward stepwise top-importance XGB (FSTI-XGB) on curated solubility data sets in terms of RMSE was found to be 0.59-0.76 Log(S) for two water data sets, while for organic solvent data sets it was 0.69-0.79 Log(S) for the Methanol data set, 0.65-0.79 for the Ethanol data set, and 0.62-0.70 Log(S) for the Acetone data set. That was the first step. In the second step, we used uncurated and curated AquaSolDB data sets for applicability domain (AD) tests of Drugbank, PubChem, and COCONUT databases and determined that more than 95% of studied ca. 500,000 compounds were within the AD. In the third step, we applied conformal prediction to obtain narrow prediction intervals and we successfully validated them using test sets' true solubility values. With prediction intervals obtained in the last fourth step, we were able to estimate individual error margins and the accuracy class of the solubility prediction for molecules within the AD of three public databases. All that was possible without the knowledge of experimental database solubilities. We find these four steps novel because usually, solubility-related works only study the first step or the first two steps.
引用
下载
收藏
页数:28
相关论文
共 50 条
  • [41] Extreme Gradient Boosting Regression Model for Soil Available Boron
    Gokmen, F.
    Uygur, V.
    Sukusu, E.
    EURASIAN SOIL SCIENCE, 2023, 56 (06) : 738 - 746
  • [42] EXTREME GRADIENT BOOSTING REGRESSION MODEL FOR SOIL THERMAL CONDUCTIVITY
    Yurttakal, Ahmet Hasim
    THERMAL SCIENCE, 2021, 25 : S1 - S7
  • [43] Gradient boosting with extreme-value theory for wildfire prediction
    Jonathan Koh
    Extremes, 2023, 26 : 273 - 299
  • [44] Gradient boosting with extreme-value theory for wildfire prediction
    Koh, Jonathan
    EXTREMES, 2023, 26 (02) : 273 - 299
  • [45] Power Grid Stability Identification Based on eXtreme Gradient Boosting
    Shan, Jinning
    Li, Zhengwen
    Zhao, Peng
    Wang, Chenqi
    Wang, Xin
    2019 6TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2019), 2019, : 803 - 809
  • [46] Automatic detection of seismic event based on eXtreme gradient boosting
    Huang J.
    Zhang R.
    Gao R.
    Li Y.
    Duan W.
    Chen F.
    Guo T.
    Pan C.
    Zhongguo Shiyou Daxue Xuebao (Ziran Kexue Ban)/Journal of China University of Petroleum (Edition of Natural Science), 2024, 48 (03): : 44 - 56
  • [47] Extreme Learning Machine Enhanced Gradient Boosting for Credit Scoring
    Zou, Yao
    Gao, Changchun
    ALGORITHMS, 2022, 15 (05)
  • [48] Investigation on eXtreme Gradient Boosting for cutting force prediction in milling
    Heitz, Thomas
    He, Ning
    Ait-Mlouk, Addi
    Bachrathy, Daniel
    Chen, Ni
    Zhao, Guolong
    Li, Liang
    JOURNAL OF INTELLIGENT MANUFACTURING, 2023, 36 (1) : 285 - 301
  • [49] Electricity Theft Detection Base on Extreme Gradient Boosting in AMI
    Yan, Zhongzong
    Wen, He
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2021, 70
  • [50] Pavement aggregate shape classification based on extreme gradient boosting
    Pei, Lili
    Sun, Zhaoyun
    Yu, Ting
    Li, Wei
    Hao, Xueli
    Hu, Yuanjiao
    Yang, Chunmei
    CONSTRUCTION AND BUILDING MATERIALS, 2020, 256