Machine Learning-Guided Prediction of Cocrystals Using Point Cloud-Based Molecular Representation

被引:5
|
作者
Ahmadi, Soroush [1 ]
Ghanavati, Mohammad Amin [1 ]
Rohani, Sohrab [1 ]
机构
[1] Western Univ, Chem & Biochem Engn, London, ON N6A 5B9, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Design for testability - Machine learning - Physicochemical properties - Salicylic acid - Synthesis (chemical);
D O I
10.1021/acs.chemmater.3c01437
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
The design and synthesis of cocrystals have emerged as promising crystal engineering strategies for enhancing the physicochemical properties of a diverse range of target molecules. A prediction strategy to identify whether a pair of target and auxiliary molecules would form a cocrystal can greatly accelerate the process of cocrystal discovery. In this study, we compiled and performed DFT calculations for 12,776 molecules (6,388 cocrystals). All entries in the database were obtained from experimental attempts reported in the literature. Electrostatic potential (ESP) surfaces were then extracted from the DFT results and used for the development of four machine learning models (PointNet, ANN, RF, Ensemble). The Ensemble model, leveraging the complementary strengths of the PointNet, ANN, and RF models, demonstrated superior discriminatory performance with a BACC (0.942) and an AUC (0.986) on the unseen test data subset. To assess the performance of the models on individual molecules, we separated the cocrystals of caffeine, fumaric acid, and salicylic acid from the overall database. The Ensemble model exhibited remarkable robustness, classifying the 312 cocrystals in this subset into their respective classes, with an average BACC of 98%. Furthermore, through conducting data analysis, 132 batches of cocrystal instances were gathered. After three batches were excluded, our proposed models were tested with these previously unseen molecules both before and after implementation of a batchwise retraining method.
引用
收藏
页码:1153 / 1161
页数:9
相关论文
共 50 条
  • [1] Machine Learning-Guided Prediction of Hydroformylation
    Shi, Haonan
    Shen, Chaoren
    Huang, Zheng
    Dong, Kaiwu
    CHEMPHYSCHEM, 2025, 26 (03)
  • [2] Cement strength prediction using cloud-based machine learning techniques
    Kumar, Nand
    Naranje, Vishal
    Salunkhe, Sachin
    JOURNAL OF STRUCTURAL INTEGRITY AND MAINTENANCE, 2020, 5 (04) : 244 - 251
  • [3] Learning-guided point cloud vectorization for building component modeling
    Chuang, Tzu-Yi
    Sung, Cheng-Che
    AUTOMATION IN CONSTRUCTION, 2021, 132
  • [4] CLOUD-BASED MACHINE LEARNING FOR BUS ARRIVAL TIME PREDICTION
    Olczyk, Adrian
    Galuszka, Adam
    CARPATHIAN LOGISTICS CONGRESS (CLC' 2016), 2017, : 173 - 177
  • [5] Cloud-Based Parallel Machine Learning for Tool Wear Prediction
    Wu, Dazhong
    Jennings, Connor
    Terpenny, Janis
    Kumara, Soundar
    Gao, Robert X.
    JOURNAL OF MANUFACTURING SCIENCE AND ENGINEERING-TRANSACTIONS OF THE ASME, 2018, 140 (04):
  • [6] Cloud-based in-situ battery life prediction and classification using machine learning
    Zhang, Yongzhi
    Zhao, Mingyuan
    ENERGY STORAGE MATERIALS, 2023, 57 : 346 - 359
  • [7] Cloud-Based Machine Learning Methods for Parameter Prediction in Textile Manufacturing
    Chang, Ray-, I
    Lin, Jia-Ying
    Hung, Yu-Hsin
    SENSORS, 2024, 24 (04)
  • [8] Machine learning-guided prediction and optimization of precipitation efficiency in the Bayer process
    Bakhtom, Abbas
    Bariki, Saeed Ghasemzade
    Movahedirad, Salman
    Sobati, Mohammad Amin
    CHEMICAL PAPERS, 2023, 77 (05) : 2509 - 2524
  • [9] Machine learning-guided prediction and optimization of precipitation efficiency in the Bayer process
    Abbas Bakhtom
    Saeed Ghasemzade Bariki
    Salman Movahedirad
    Mohammad Amin Sobati
    Chemical Papers, 2023, 77 : 2509 - 2524
  • [10] Cloud-Based Machine Learning for Predictive Analytics: Tool Wear Prediction in Milling
    Wu, Dazhong
    Jennings, Connor
    Terpenny, Janis
    Kumara, Soundar
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2062 - 2069