Pruned Machine Learning Models to Predict Aqueous Solubility

被引:13
|
作者
Perryman, Alexander L. [1 ]
Inoyama, Daigo [1 ]
Patel, Jimmy S. [1 ]
Ekins, Sean [4 ]
Freundlich, Joel S. [1 ,2 ,3 ]
机构
[1] Rutgers State Univ, New Jersey Med Sch, Dept Pharmacol Physiol & Neurosci, Newark, NJ 07103 USA
[2] Rutgers State Univ, New Jersey Med Sch, Div Infect Dis, Dept Med, Newark, NJ 07103 USA
[3] Rutgers State Univ, New Jersey Med Sch, Ruy V Lourenco Ctr Study Emerging & Reemerging Pa, Newark, NJ 07103 USA
[4] Collaborat Chem Inc, Fuquay Varina, NC 27526 USA
来源
ACS OMEGA | 2020年 / 5卷 / 27期
基金
美国国家卫生研究院;
关键词
ORGANIC-COMPOUNDS; DRUG DISCOVERY; DIVERSE SET; COEFFICIENT; BIOACTIVITY; STABILITY; AGREEMENT;
D O I
10.1021/acsomega.0c01251
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Solubility is a key metric for therapeutic compounds. Conversely, insoluble compounds cloud the accuracy of assays at all stages of chemical biology and drug discovery. Herein, we disclose naive Bayesian classifier models to predict aqueous solubility. Publicly accessible aqueous solubility data were used to create two full, or nonpruned, training sets. These two sets were also combined to create a full fused set, and a training set comprised of a literature collation of solubility data was also considered as a reference. We tested different extents of data pruning on the training sets and constructed machine learning models that were evaluated with two independent, external test sets that contained compounds that were different from the training sets. The best pruned and fused model was significantly more accurate, in comparison to either the full model or the full fused model, with the prediction of these external test sets. By carefully removing data from the training set, less information can be used to create more accurate machine learning models for aqueous solubility. This knowledge and the curated training sets should prove useful to future machine learning approaches.
引用
收藏
页码:16562 / 16567
页数:6
相关论文
共 50 条
  • [31] Physics-Based Machine Learning Models Predict Carbon Dioxide Solubility in Chemically Reactive Deep Eutectic Solvents
    Mohan, Mood
    Demerdash, Omar N.
    Simmons, Blake A.
    Singh, Seema
    Kidder, Michelle K.
    Smith, Jeremy C.
    [J]. ACS OMEGA, 2024, 9 (17): : 19548 - 19559
  • [32] Machine learning models to predict onset of dementia: A label learning approach
    Nori, Vijay S.
    Hane, Christopher A.
    Crown, William H.
    Au, Rhoda
    Burke, William J.
    Sanghavi, Darshak M.
    Bleicher, Paul
    [J]. ALZHEIMERS & DEMENTIA-TRANSLATIONAL RESEARCH & CLINICAL INTERVENTIONS, 2019, 5 (01) : 918 - 925
  • [33] Findings of the Challenge To Predict Aqueous Solubility
    Hopfinger, Anton J.
    Esposito, Emilio Xavier
    Llinas, A.
    Glen, R. C.
    Goodman, J. M.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2009, 49 (01) : 1 - 5
  • [34] Development of machine learning-based solubility models for estimation of Hydrogen solubility in oil: Models assessment and validation
    Jin, Hulin
    Jin, Zhiran
    Kim, Yong-Guk
    Fan, Chunyang
    [J]. CASE STUDIES IN THERMAL ENGINEERING, 2023, 51
  • [35] Machine Learning and Statistical Models to Predict Postpartum Hemorrhage Reply
    Venkatesh, Kartik K.
    Jelovsek, J. Eric
    [J]. OBSTETRICS AND GYNECOLOGY, 2020, 136 (01): : 195 - 195
  • [36] Machine Learning Models to Classify and Predict Depression in College Students
    Iparraguirre-Villanueva, Orlando
    Paulino-Moreno, Cleoge
    Epifanía-Huerta, Andrés
    Torres-Ceclén, Carmen
    [J]. International Journal of Interactive Mobile Technologies, 2024, 18 (14) : 148 - 163
  • [37] Incorporating Radiomics into Machine Learning Models to Predict Outcomes of Neuroblastoma
    Liu, Gengbo
    Poon, Mini
    Zapala, Matthew A.
    Temple, William C.
    Vo, Kieuhoa T.
    Matthay, Kathrine K.
    Mitra, Debasis
    Seo, Youngho
    [J]. JOURNAL OF DIGITAL IMAGING, 2022, 35 (03) : 605 - 612
  • [38] Machine Learning Models to Predict Students’ Study Path Selection
    Dirin A.
    Saballe C.A.
    [J]. International Journal of Interactive Mobile Technologies, 2022, 16 (01) : 158 - 183
  • [39] Machine Learning Models to Predict Childhood and Adolescent Obesity: A Review
    Colmenarejo, Gonzalo
    [J]. NUTRIENTS, 2020, 12 (08) : 1 - 31
  • [40] Using machine learning models to predict falls in hospitalised adults
    Jahandideh, S.
    Hutchinson, A. F.
    Bucknall, T. K.
    Considine, J.
    Driscoll, A.
    Manias, E.
    Phillips, N. M.
    Rasmussen, B.
    Vos, N.
    Hutchinson, A. M.
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2024, 187