Precise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning

被引:4
|
作者
Chen S. [1 ,2 ]
An S. [1 ,2 ]
Babazade R. [3 ]
Jung Y. [1 ,2 ,4 ,5 ]
机构
[1] Department of Chemical and Biomolecular Engineering, KAIST, Daejeon
[2] Department of Chemical and Biological Engineering, Seoul National University, Seoul
[3] Graduate School of AI, KAIST, Daejeon
[4] Institute of Chemical Processes, Seoul National University, Seoul
[5] Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul
基金
新加坡国家研究基金会;
关键词
D O I
10.1038/s41467-024-46364-y
中图分类号
学科分类号
摘要
Atom-to-atom mapping (AAM) is a task of identifying the position of each atom in the molecules before and after a chemical reaction, which is important for understanding the reaction mechanism. As more machine learning (ML) models were developed for retrosynthesis and reaction outcome prediction recently, the quality of these models is highly dependent on the quality of the AAM in reaction datasets. Although there are algorithms using graph theory or unsupervised learning to label the AAM for reaction datasets, existing methods map the atoms based on substructure alignments instead of chemistry knowledge. Here, we present LocalMapper, an ML model that learns correct AAM from chemist-labeled reactions via human-in-the-loop machine learning. We show that LocalMapper can predict the AAM for 50 K reactions with 98.5% calibrated accuracy by learning from only 2% of the human-labeled reactions from the entire dataset. More importantly, the confident predictions given by LocalMapper, which cover 97% of 50 K reactions, show 100% accuracy for 3,000 randomly sampled reactions. In an out-of-distribution experiment, LocalMapper shows favorable performance over other existing methods. We expect LocalMapper can be used to generate more precise reaction AAM and improve the quality of future ML-based reaction prediction models. © The Author(s) 2024.
引用
收藏
相关论文
共 50 条
  • [1] Models for Identification of Erroneous Atom-to-Atom Mapping of Reactions Performed by Automated Algorithms
    Muller, Christophe
    Marcou, Gilles
    Horvath, Dragos
    Aires-de-Sousa, Joao
    Varnek, Alexandre
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2012, 52 (12) : 3116 - 3122
  • [2] Machine Learning Guided Atom Mapping of Metabolic Reactions
    Litsa, Eleni E.
    Pena, Matthew I.
    Moll, Mark
    Giannakopoulos, George
    Bennett, George N.
    Kavraki, Lydia E.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (03) : 1121 - 1135
  • [3] Human-in-the-loop Applied Machine Learning
    Brodley, Carla E.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 1 - 1
  • [4] A survey of human-in-the-loop for machine learning
    Wu, Xingjiao
    Xiao, Luwei
    Sun, Yixuan
    Zhang, Junhang
    Ma, Tianlong
    He, Liang
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 135 : 364 - 381
  • [5] A unified microstructure segmentation approach via human-in-the-loop machine learning
    Na, Juwon
    Kim, Se-Jong
    Kim, Heekyu
    Kang, Seong-Hoon
    Lee, Seungchul
    [J]. ACTA MATERIALIA, 2023, 255
  • [6] HELIX: Accelerating Human-in-the-loop Machine Learning
    Xin, Doris
    Ma, Litian
    Liu, Jialin
    Macke, Stephen
    Song, Shuchen
    Parameswaran, Aditya
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (12): : 1958 - 1961
  • [7] Human-in-the-loop machine learning: a state of the art
    Mosqueira-Rey, Eduardo
    Hernandez-Pereira, Elena
    Alonso-Rios, David
    Bobes-Bascaran, Jose
    Fernandez-Leal, Angel
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (04) : 3005 - 3054
  • [8] Human-in-the-loop machine learning: a state of the art
    Eduardo Mosqueira-Rey
    Elena Hernández-Pereira
    David Alonso-Ríos
    José Bobes-Bascarán
    Ángel Fernández-Leal
    [J]. Artificial Intelligence Review, 2023, 56 : 3005 - 3054
  • [9] Human-in-the-Loop Machine Learning for the Treatment of Pancreatic Cancer
    Mosqueira-Rey, Eduardo
    Perez-Sanchez, Alberto
    Hernandez-Pereira, Elena
    Alonso-Rios, David
    Bobes-Bascaran, Jose
    Fernandez-Leal, Angel
    Moret-Bonillo, Vicente
    Vidal-Insua, Yolanda
    Vazquez-Rivera, Francisca
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [10] Human-in-the-loop machine learning with applications for population health
    Long Chen
    Jiangtao Wang
    Bin Guo
    Liming Chen
    [J]. CCF Transactions on Pervasive Computing and Interaction, 2023, 5 : 1 - 12