Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies

被引:132
|
作者
Jorner, Kjell [1 ]
Brinck, Tore [2 ]
Norrby, Per-Ola [3 ]
Buttar, David [1 ]
机构
[1] AstraZeneca, Early Chem Dev, Pharmaceut Sci, R&D, Macclesfield, Cheshire, England
[2] KTH Royal Inst Technol, Dept Chem, Appl Phys Chem, CBH, Stockholm, Sweden
[3] AstraZeneca, Data Sci & Modelling, Pharmaceut Sci, R&D, Gothenburg, Sweden
基金
瑞典研究理事会;
关键词
NUCLEOPHILIC-SUBSTITUTION; ELECTROSTATIC POTENTIALS; REACTIVITY; REGIOSELECTIVITY; CLASSIFICATION; EFFICIENT;
D O I
10.1039/d0sc04896h
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Accurate prediction of chemical reactions in solution is challenging for current state-of-the-art approaches based on transition state modelling with density functional theory. Models based on machine learning have emerged as a promising alternative to address these problems, but these models currently lack the precision to give crucial information on the magnitude of barrier heights, influence of solvents and catalysts and extent of regio- and chemoselectivity. Here, we construct hybrid models which combine the traditional transition state modelling and machine learning to accurately predict reaction barriers. We train a Gaussian Process Regression model to reproduce high-quality experimental kinetic data for the nucleophilic aromatic substitution reaction and use it to predict barriers with a mean absolute error of 0.77 kcal mol(-1) for an external test set. The model was further validated on regio- and chemoselectivity prediction on patent reaction data and achieved a competitive top-1 accuracy of 86%, despite not being trained explicitly for this task. Importantly, the model gives error bars for its predictions that can be used for risk assessment by the end user. Hybrid models emerge as the preferred alternative for accurate reaction prediction in the very common low-data situation where only 100-150 rate constants are available for a reaction class. With recent advances in deep learning for quickly predicting barriers and transition state geometries from density functional theory, we envision that hybrid models will soon become a standard alternative to complement current machine learning approaches based on ground-state physical organic descriptors or structural information such as molecular graphs or fingerprints.
引用
收藏
页码:1163 / 1175
页数:13
相关论文
共 50 条
  • [41] Accurate prediction of essential proteins using ensemble machine learning
    Lu, Dezhi
    Wu, Hao
    Hou, Yutong
    Wu, Yuncheng
    Liu, Yuanyuan
    Wang, Jinwu
    CHINESE PHYSICS B, 2025, 34 (01)
  • [42] Accurate prediction of myopic progression and high myopia by machine learning
    Li, Jiahui
    Zeng, Simiao
    Li, Zhihuan
    Xu, Jie
    Sun, Zhuo
    Zhao, Jing
    Li, Meiyan
    Zou, Zixing
    Guan, Taihua
    Zeng, Jin
    Liu, Zhuang
    Xiao, Wenchao
    Wei, Ran
    Miao, Hanpei
    Ziyar, Ian
    Huang, Junxiong
    Gao, Yuanxu
    Zeng, Yangfa
    Zhou, Xing-Tao
    Zhang, Kang
    PRECISION CLINICAL MEDICINE, 2024, 7 (01)
  • [43] Accurate and fast machine learning algorithm for systems outage prediction
    Gu, Chan
    Chen, Chen
    Tang, Wei
    SOLAR ENERGY, 2023, 251 (286-294) : 286 - 294
  • [44] DFT-Machine Learning Approach for Accurate Prediction of pKa
    Lawler, Robin
    Liu, Yao-Hao
    Majaya, Nessa
    Allam, Omar
    Ju, Hyunchul
    Kim, Jin Young
    Jang, Seung Soon
    JOURNAL OF PHYSICAL CHEMISTRY A, 2021, 125 (39): : 8712 - 8722
  • [45] ACCURATE THEORETICAL PREDICTION OF THE EXPERIMENTAL GROUND-STATE TOTAL ATOMIC ENERGIES
    KLOBUKOWSKI, M
    FRAGA, S
    PHYSICAL REVIEW A, 1988, 38 (03): : 1593 - 1594
  • [46] The era of big data: Genome-scale modelling meets machine learning
    Antonakoudis, Athanasios
    Barbosa, Rodrigo
    Kotidis, Pavlos
    Kontoravdi, Cleo
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2020, 18 : 3287 - 3300
  • [47] Machine Learning Prediction of Defect Formation Energies in a-SiO2
    Milardovich, Diego
    Jech, Markus
    Waldhoer, Dominic
    Waltl, Michael
    Grasser, Tibor
    2020 INTERNATIONAL CONFERENCE ON SIMULATION OF SEMICONDUCTOR PROCESSES AND DEVICES (SISPAD 2020), 2020, : 339 - 342
  • [48] Machine learning prediction of defect formation energies in a-SiO2
    Technische Universität Wien, Institute for Microelectronics, Gußhausstraße 27-29, Vienna
    1040, Austria
    不详
    1040, Austria
    Int Conf Simul Semicond Process Dev Proc SISPAD, 2020, (339-342):
  • [49] Machine Learning Enabled Prediction of Stacking Fault Energies in Concentrated Alloys
    Arora, Gaurav
    Aidhy, Dilpuneet S.
    METALS, 2020, 10 (08) : 1 - 17
  • [50] Elraglusib response prediction and mechanistic discovery using iterative machine learning
    McDermott, Joseph
    Weiskittel, Taylor
    Billadeau, Daniel
    Carneiro, Benedito
    Li, Hu
    Mazar, Andrew
    CANCER RESEARCH, 2023, 83 (07)