ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes

被引:10
|
作者
Wilary, Damian M. [1 ]
Cole, Jacqueline M. [1 ,2 ]
机构
[1] Univ Cambridge, Dept Phys, Cavendish Lab, Cambridge CB3 0HE, England
[2] STFC Rutherford Appleton Lab, ISIS Neutron & Muon Source, Harwell Sci & Innovat Campus, Didcot OX11 0QX, Oxon, England
关键词
INFORMATION; CLIDE; TOOL;
D O I
10.1021/acs.jcim.3c00422
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Knowledge in the chemical domain is often disseminated graphically via chemical reaction schemes. The task of describing chemical transformations is greatly simplified by introducing reaction schemes that are composed of chemical diagrams and symbols. While intuitively understood by any chemist, like most graphical representations, such drawings are not easily understood by machines; this poses a challenge in the context of data extraction. Currently available tools are limited in their scope of extraction and require manual preprocessing, thus slowing down the speed of data extraction. We present a new tool, ReactionDataExtractor v2.0, which uses a combination of neural networks and symbolic artificial intelligence to effectively remove this barrier. We have evaluated our tool on a test set composed of reaction schemes that were taken from open-source journal articles and realized F1 score metrics between 75 and 96%. These evaluation metrics can be further improved by tuning our object-detection models to a specific chemical subdomain thanks to a data-driven approach that we have adopted with synthetically generated data. The system architecture of our tool is modular, which allows it to balance speed and accuracy to afford an autonomous, high-throughput solution for image-based chemical data extraction.
引用
收藏
页码:6053 / 6067
页数:15
相关论文
共 50 条
  • [41] CLAS: A new deep learning approach for sentiment analysis from Twitter data
    Adil Baqach
    Amal Battou
    [J]. Multimedia Tools and Applications, 2023, 82 : 47457 - 47475
  • [42] A Deep Learning Approach for Sleep -Wake Detection from HRV and Accelerometer Data
    Chen, Zhenghua
    Wu, Min
    Wu, Jiyan
    Ding, Jie
    Zeng, Zeng
    Surmacz, Karl
    Li, Xiaoli
    [J]. 2019 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL & HEALTH INFORMATICS (BHI), 2019,
  • [43] CLAS: A new deep learning approach for sentiment analysis from Twitter data
    Baqach, Adil
    Battou, Amal
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (30) : 47457 - 47475
  • [44] A deep learning approach to prediction of blood group antigens from genomic data
    Moslemi, Camous
    Saekmose, Susanne
    Larsen, Rune
    Brodersen, Thorsten
    Bay, Jakob T.
    Didriksen, Maria
    Nielsen, Kaspar R.
    Bruun, Mie T.
    Dowsett, Joseph
    Dinh, Khoa M.
    Mikkelsen, Christina
    Hyvarinen, Kati
    Ritari, Jarmo
    Partanen, Jukka
    Ullum, Henrik
    Erikstrup, Christian
    Ostrowski, Sisse R.
    Olsson, Martin L.
    Pedersen, Ole B.
    [J]. TRANSFUSION, 2024,
  • [45] A Deep Learning Approach for Predicting Spatiotemporal Dynamics From Sparsely Observed Data
    Saha, Priyabrata
    Mukhopadhyay, Saibal
    [J]. IEEE ACCESS, 2021, 9 : 64200 - 64210
  • [46] A deep learning approach for detecting traffic accidents from social media data
    Zhang, Zhenhua
    He, Qing
    Gao, Jing
    Ni, Ming
    [J]. TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2018, 86 : 580 - 596
  • [47] Improving Road Extraction in Hyperspectral Data with Deep Learning Models
    Zhao, Xuying
    Xing, Zhibo
    Zou, Zexiao
    Zhou, Wu
    Bian, Zhonghui
    Li, Xiaodong
    [J]. ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2023, 2024, 1998 : 131 - 138
  • [48] Developing a Deep Learning Based Approach for Anomalies Detection from EEG Data
    Alvi, Ashik Mostafa
    Siuly, Siuly
    Wang, Hua
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2021, PT I, 2021, 13080 : 591 - 602
  • [49] A Deep Learning Approach for Chinese Tourism Field Attribute Extraction
    Hu, Yan
    Nuo, Minghua
    Tang, Chao
    [J]. 2019 15TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS 2019), 2019, : 108 - 112
  • [50] Automating chemical structure and inhibition data extraction from patents: A text mining approach
    Hinton, Andrew
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 256