Three machine learning models for the 2019 Solubility Challenge

被引:9
|
作者
Mitchell, John B. O. [1 ,2 ]
机构
[1] Univ St Andrews, EaStCHEM Sch Chem, St Andrews KY16 9ST, Fife, Scotland
[2] Univ St Andrews, Biomed Sci Res Complex, St Andrews KY16 9ST, Fife, Scotland
来源
ADMET AND DMPK | 2020年 / 8卷 / 03期
关键词
Aqueous intrinsic solubility; Solubility prediction; Random Forest; Extra Trees; Bagging; Consensus classifiers; Wisdom of Crowds; Inter-laboratory error; INTRINSIC AQUEOUS SOLUBILITY; DRUG SOLUBILITY; RANDOM FOREST; FREE-ENERGY; PREDICTION; SOLVATION; DISCOVERY;
D O I
10.5599/admet.835
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
We describe three machine learning models submitted to the 2019 Solubility Challenge. All are founded on tree-like classifiers, with one model being based on Random Forest and another on the related Extra Trees algorithm. The third model is a consensus predictor combining the former two with a Bagging classifier. We call this consensus classifier Vox Machinarum, and here discuss how it benefits from the Wisdom of Crowds. On the first 2019 Solubility Challenge test set of 100 low-variance intrinsic aqueous solubilities, Extra Trees is our best classifier. One the other, a high-variance set of 32 molecules, we find that Vox Machinarum and Random Forest both perform a little better than Extra Trees, and almost equally to one another. We also compare the gold standard solubilities from the 2019 Solubility Challenge with a set of literature-based solubilities for most of the same compounds.
引用
收藏
页码:215 / +
页数:37
相关论文
共 50 条
  • [1] Pruned Machine Learning Models to Predict Aqueous Solubility
    Perryman, Alexander L.
    Inoyama, Daigo
    Patel, Jimmy S.
    Ekins, Sean
    Freundlich, Joel S.
    [J]. ACS OMEGA, 2020, 5 (27): : 16562 - 16567
  • [2] Development of machine learning-based solubility models for estimation of Hydrogen solubility in oil: Models assessment and validation
    Jin, Hulin
    Jin, Zhiran
    Kim, Yong-Guk
    Fan, Chunyang
    [J]. CASE STUDIES IN THERMAL ENGINEERING, 2023, 51
  • [3] Evaluation of Machine Learning Models for Aqueous Solubility Prediction in Drug Discovery
    Xue, Nian
    Zhang, Yuzhu
    Liu, Sensen
    [J]. 2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA, ICAIBD 2024, 2024, : 26 - 33
  • [4] Atomistic Descriptors for Machine Learning Models of Solubility Parameters for Small Molecules and Polymers
    Chi, Mingzhe
    Gargouri, Rihab
    Schrader, Tim
    Damak, Kamel
    Maalej, Ramzi
    Sierka, Marek
    [J]. POLYMERS, 2022, 14 (01)
  • [5] Modeling hydrogen solubility in alcohols using machine learning models and equations of state
    Mohammadi, Mohammad-Reza
    Hadavimoghaddam, Fahimeh
    Atashrouz, Saeid
    Abedi, Ali
    Hemmati-Sarapardeh, Abdolhossein
    Mohaddespour, Ahmad
    [J]. JOURNAL OF MOLECULAR LIQUIDS, 2022, 346
  • [6] Machine Learning Enabled Models to Predict Sulfur Solubility in Nuclear Waste Glasses
    Xu, Xinyi
    Han, Taihao
    Huang, Jie
    Kruger, Albert A.
    Kumar, Aditya
    Goel, Ashutosh
    [J]. ACS APPLIED MATERIALS & INTERFACES, 2021, 13 (45) : 53375 - 53387
  • [7] The Higgs Machine Learning Challenge
    Adam-Bourdarios, C.
    Cowan, G.
    Germain-Renaud, C.
    Guyon, I.
    Kegl, B.
    Rousseau, D.
    [J]. 21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9, 2015, 664
  • [8] Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions
    Cai, Yu-Qing
    Gong, Da-Xin
    Tang, Li-Ying
    Cai, Yue
    Li, Hui-Jun
    Jing, Tian-Ci
    Gong, Mengchun
    Hu, Wei
    Zhang, Zhen-Wei
    Zhang, Xingang
    Zhang, Guang-Wei
    [J]. JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [9] Develop machine learning-based regression predictive models for engineering protein solubility
    Han, Xi
    Wang, Xiaonan
    Zhou, Kang
    [J]. BIOINFORMATICS, 2019, 35 (22) : 4640 - 4646
  • [10] Blinded Predictions and Post Hoc Analysis of the Second Solubility Challenge Data: Exploring Training Data and Feature Set Selection for Machine and Deep Learning Models
    Conn, Jonathan G. M.
    Carter, James W.
    Conn, Justin J. A.
    Subramanian, Vigneshwari
    Baxter, Andrew
    Engkvist, Ola
    Llinas, Antonio
    Ratkova, Ekaterina L.
    Pickett, Stephen D.
    McDonagh, James L.
    Palmer, David S.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (04) : 1099 - 1113