Extreme Gradient Boosting Combined with Conformal Predictors for Informative Solubility Estimation

被引:0
|
作者
Jovic, Ozren [1 ]
Mouras, Rabah [1 ]
机构
[1] Univ Limerick, Bernal Inst, Pharmaceut Mfg Technol Ctr, Dept Chem Sci, Limerick V94 T9PX, Ireland
来源
MOLECULES | 2024年 / 29卷 / 01期
关键词
solubility; machine learning; extreme gradient boosting; variable selection; conformal predictor; prediction interval; applicability domain; molecular descriptor; AQUEOUS SOLUBILITY; APPLICABILITY DOMAIN; ORGANIC-COMPOUNDS; TRAINING SET; MODELS; SOLVENTS; DRUGS;
D O I
10.3390/molecules29010019
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We used the extreme gradient boosting (XGB) algorithm to predict the experimental solubility of chemical compounds in water and organic solvents and to select significant molecular descriptors. The accuracy of prediction of our forward stepwise top-importance XGB (FSTI-XGB) on curated solubility data sets in terms of RMSE was found to be 0.59-0.76 Log(S) for two water data sets, while for organic solvent data sets it was 0.69-0.79 Log(S) for the Methanol data set, 0.65-0.79 for the Ethanol data set, and 0.62-0.70 Log(S) for the Acetone data set. That was the first step. In the second step, we used uncurated and curated AquaSolDB data sets for applicability domain (AD) tests of Drugbank, PubChem, and COCONUT databases and determined that more than 95% of studied ca. 500,000 compounds were within the AD. In the third step, we applied conformal prediction to obtain narrow prediction intervals and we successfully validated them using test sets' true solubility values. With prediction intervals obtained in the last fourth step, we were able to estimate individual error margins and the accuracy class of the solubility prediction for molecules within the AD of three public databases. All that was possible without the knowledge of experimental database solubilities. We find these four steps novel because usually, solubility-related works only study the first step or the first two steps.
引用
下载
收藏
页数:28
相关论文
共 50 条
  • [1] Modeling hydrogen solubility in hydrocarbons using extreme gradient boosting and equations of state
    Mohammad-Reza Mohammadi
    Fahime Hadavimoghaddam
    Maryam Pourmahdi
    Saeid Atashrouz
    Muhammad Tajammal Munir
    Abdolhossein Hemmati-Sarapardeh
    Amir H. Mosavi
    Ahmad Mohaddespour
    Scientific Reports, 11
  • [2] Modeling hydrogen solubility in hydrocarbons using extreme gradient boosting and equations of state
    Mohammadi, Mohammad-Reza
    Hadavimoghaddam, Fahime
    Pourmahdi, Maryam
    Atashrouz, Saeid
    Munir, Muhammad Tajammal
    Hemmati-Sarapardeh, Abdolhossein
    Mosavi, Amir H.
    Mohaddespour, Ahmad
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [3] Extreme Gradient Boosting for yield estimation compared with Deep Learning approaches
    Huber, Florian
    Yushchenko, Artem
    Stratmann, Benedikt
    Steinhage, Volker
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2022, 202
  • [4] Extreme Gradient Boosting for yield estimation compared with Deep Learning approaches
    Huber, Florian
    Yushchenko, Artem
    Stratmann, Benedikt
    Steinhage, Volker
    Computers and Electronics in Agriculture, 2022, 202
  • [5] Estimation of Daily Global Horizontal Irradiation Using Extreme Gradient Boosting Machines
    Urraca, Ruben
    Antonanzas, Javier
    Antonanzas-Torres, Fernando
    Javier Martinez-de-Pison, Francisco
    INTERNATIONAL JOINT CONFERENCE SOCO'16- CISIS'16-ICEUTE'16, 2017, 527 : 105 - 113
  • [6] Gradient boosting for extreme quantile regression
    Velthoen, Jasper
    Dombry, Clement
    Cai, Juan-Juan
    Engelke, Sebastian
    EXTREMES, 2023, 26 (04) : 639 - 667
  • [7] Gradient boosting for extreme quantile regression
    Jasper Velthoen
    Clément Dombry
    Juan-Juan Cai
    Sebastian Engelke
    Extremes, 2023, 26 : 639 - 667
  • [8] Extreme Gradient Boosting for Cyberpropaganda Detection
    Fattahi, Jaouhar
    Mejri, Mohamed
    Ziadia, Marwa
    NEW TRENDS IN INTELLIGENT SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES, 2021, 337 : 99 - 112
  • [9] Extreme Gradient Boosting and Behavioral Biometrics
    Manning, Benjamin
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4969 - 4970
  • [10] Boosting exact pattern matching with extreme gradient boosting (and more)
    Susik, Robert
    Grabowski, Szymon
    Journal of Supercomputing, 2025, 81 (05):