ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes

被引:10
|
作者
Wilary, Damian M. [1 ]
Cole, Jacqueline M. [1 ,2 ]
机构
[1] Univ Cambridge, Dept Phys, Cavendish Lab, Cambridge CB3 0HE, England
[2] STFC Rutherford Appleton Lab, ISIS Neutron & Muon Source, Harwell Sci & Innovat Campus, Didcot OX11 0QX, Oxon, England
关键词
INFORMATION; CLIDE; TOOL;
D O I
10.1021/acs.jcim.3c00422
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Knowledge in the chemical domain is often disseminated graphically via chemical reaction schemes. The task of describing chemical transformations is greatly simplified by introducing reaction schemes that are composed of chemical diagrams and symbols. While intuitively understood by any chemist, like most graphical representations, such drawings are not easily understood by machines; this poses a challenge in the context of data extraction. Currently available tools are limited in their scope of extraction and require manual preprocessing, thus slowing down the speed of data extraction. We present a new tool, ReactionDataExtractor v2.0, which uses a combination of neural networks and symbolic artificial intelligence to effectively remove this barrier. We have evaluated our tool on a test set composed of reaction schemes that were taken from open-source journal articles and realized F1 score metrics between 75 and 96%. These evaluation metrics can be further improved by tuning our object-detection models to a specific chemical subdomain thanks to a data-driven approach that we have adopted with synthetically generated data. The system architecture of our tool is modular, which allows it to balance speed and accuracy to afford an autonomous, high-throughput solution for image-based chemical data extraction.
引用
收藏
页码:6053 / 6067
页数:15
相关论文
共 50 条
  • [1] ReactionDataExtractor: A Tool for Automated Extraction of Information from Chemical Reaction Schemes
    Wilary, Damian M.
    Cole, Jacqueline M.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2021, 61 (10) : 4962 - 4974
  • [2] Versatile Deep Learning Pipeline for Transferable Chemical Data Extraction
    Alshehri, Abdulelah S.
    Horstmann, Kai A.
    You, Fengqi
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (15) : 5888 - 5899
  • [3] Adverse drug event detection and extraction from open data: A deep learning approach
    Fan, Brandon
    Fan, Weiguo
    Smith, Carly
    Garner, Harold Skip
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (01)
  • [4] PhaseNet 2.0: Phase Unwrapping of Noisy Data Based on Deep Learning Approach
    Spoorthi, G. E.
    Gorthi, Rama Krishna Sai Subrahmanyam
    Gorthi, Subrahmanyam
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 4862 - 4872
  • [5] Learning Chemical Reaction Networks from Trajectory Data
    Zhang, Wei
    Klus, Stefan
    Conrad, Tim
    Schuette, Christof
    [J]. SIAM JOURNAL ON APPLIED DYNAMICAL SYSTEMS, 2019, 18 (04): : 2000 - 2046
  • [6] Deep learning for chemical reaction prediction
    Fooshee, David
    Mood, Aaron
    Gutman, Eugene
    Tavakoli, Mohammadamin
    Urban, Gregor
    Liu, Frances
    Huynh, Nancy
    Van Vranken, David
    Baldi, Pierre
    [J]. MOLECULAR SYSTEMS DESIGN & ENGINEERING, 2018, 3 (03): : 442 - 452
  • [7] Novel, active learning approach for deep learning of chemical data: Extracting more chemical insights by choosing less
    Haghighatlari, Mojtaba
    Hachmann, Johannes
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2019, 257
  • [8] A Deep Learning Approach to Contract Element Extraction
    Chalkidis, Ilias
    Androutsopoulos, Ion
    [J]. LEGAL KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 302 : 155 - 164
  • [9] A Deep Learning Approach for Classifying Emotions from Physiological Data
    AlZoubi, Omar
    ALMakhadmeh, Buthina
    Tawalbeh, Saja Khaled
    Yassien, Muneer Bani
    Hmeidi, Ismail
    [J]. 2020 11TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2020, : 214 - 219
  • [10] A Deep Learning Approach for Mood Recognition from Wearable Data
    Tizzano, Giuseppe Romano
    Spezialetti, Matteo
    Rossi, Silvia
    [J]. 2020 IEEE INTERNATIONAL SYMPOSIUM ON MEDICAL MEASUREMENTS AND APPLICATIONS (MEMEA), 2020,