Pruned Machine Learning Models to Predict Aqueous Solubility

被引:13
|
作者
Perryman, Alexander L. [1 ]
Inoyama, Daigo [1 ]
Patel, Jimmy S. [1 ]
Ekins, Sean [4 ]
Freundlich, Joel S. [1 ,2 ,3 ]
机构
[1] Rutgers State Univ, New Jersey Med Sch, Dept Pharmacol Physiol & Neurosci, Newark, NJ 07103 USA
[2] Rutgers State Univ, New Jersey Med Sch, Div Infect Dis, Dept Med, Newark, NJ 07103 USA
[3] Rutgers State Univ, New Jersey Med Sch, Ruy V Lourenco Ctr Study Emerging & Reemerging Pa, Newark, NJ 07103 USA
[4] Collaborat Chem Inc, Fuquay Varina, NC 27526 USA
来源
ACS OMEGA | 2020年 / 5卷 / 27期
基金
美国国家卫生研究院;
关键词
ORGANIC-COMPOUNDS; DRUG DISCOVERY; DIVERSE SET; COEFFICIENT; BIOACTIVITY; STABILITY; AGREEMENT;
D O I
10.1021/acsomega.0c01251
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Solubility is a key metric for therapeutic compounds. Conversely, insoluble compounds cloud the accuracy of assays at all stages of chemical biology and drug discovery. Herein, we disclose naive Bayesian classifier models to predict aqueous solubility. Publicly accessible aqueous solubility data were used to create two full, or nonpruned, training sets. These two sets were also combined to create a full fused set, and a training set comprised of a literature collation of solubility data was also considered as a reference. We tested different extents of data pruning on the training sets and constructed machine learning models that were evaluated with two independent, external test sets that contained compounds that were different from the training sets. The best pruned and fused model was significantly more accurate, in comparison to either the full model or the full fused model, with the prediction of these external test sets. By carefully removing data from the training set, less information can be used to create more accurate machine learning models for aqueous solubility. This knowledge and the curated training sets should prove useful to future machine learning approaches.
引用
收藏
页码:16562 / 16567
页数:6
相关论文
共 50 条
  • [1] Evaluation of Machine Learning Models for Aqueous Solubility Prediction in Drug Discovery
    Xue, Nian
    Zhang, Yuzhu
    Liu, Sensen
    [J]. 2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA, ICAIBD 2024, 2024, : 26 - 33
  • [2] Machine Learning Enabled Models to Predict Sulfur Solubility in Nuclear Waste Glasses
    Xu, Xinyi
    Han, Taihao
    Huang, Jie
    Kruger, Albert A.
    Kumar, Aditya
    Goel, Ashutosh
    [J]. ACS APPLIED MATERIALS & INTERFACES, 2021, 13 (45) : 53375 - 53387
  • [3] Random forest models to predict aqueous solubility
    Palmer, David S.
    O'Boyle, Noel M.
    Glen, Robert C.
    Mitchell, John B. O.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2007, 47 (01) : 150 - 158
  • [4] Evolving machine learning models to predict hydrogen sulfide solubility in the presence of various ionic liquids
    Amedi, Hamid Reza
    Baghban, Alireza
    Ahmadi, Mohammad Ali
    [J]. JOURNAL OF MOLECULAR LIQUIDS, 2016, 216 : 411 - 422
  • [5] Exploring the performance of machine learning models to predict carbon monoxide solubility in underground pure/saline water
    Vaferi, Behzad
    Dehbashi, Mohsen
    Alibak, Ali Hosin
    Yousefzadeh, Reza
    [J]. MARINE AND PETROLEUM GEOLOGY, 2024, 162
  • [6] Three machine learning models for the 2019 Solubility Challenge
    Mitchell, John B. O.
    [J]. ADMET AND DMPK, 2020, 8 (03): : 215 - +
  • [7] Machine learning models to predict sweetness of molecules
    Goel, Mansi
    Sharma, Aditi
    Chilwal, Ayush Singh
    Kumari, Sakshi
    Kumar, Ayush
    Bagler, Ganesh
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 152
  • [8] Can Machine Learning Models Predict Inflation?
    Ivascu, Codrut
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BUSINESS EXCELLENCE, 2023, 17 (01): : 1748 - 1756
  • [9] MACHINE LEARNING MODELS TO PREDICT ASTHMA EXACERBATIONS
    Turcatel, Gianluca
    Xiao, Yi
    Caveney, Scott
    Gnacadja, Gilles
    Kim, Julie
    Molfino, Nestor
    [J]. CHEST, 2023, 164 (04) : 53A - 53A
  • [10] Estimating the domain of applicability for machine learning QSAR models:: a study on aqueous solubility of drug discovery molecules
    Schroeter, Timon Sebastian
    Schwaighofer, Anton
    Mika, Sebastian
    Ter Laak, Antonius
    Suelzle, Detlev
    Ganzer, Ursula
    Heinrich, Nikolaus
    Mueller, Klaus-Robert
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2007, 21 (12) : 651 - 664