ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning

被引:20
|
作者
Weir, Hayley [1 ,2 ]
Thompson, Keiran [1 ,2 ]
Woodward, Amelia [1 ]
Choi, Benjamin [3 ]
Braun, Augustin [1 ]
Martinez, Todd J. [1 ,2 ]
机构
[1] Stanford Univ, Dept Chem, Stanford, CA 94305 USA
[2] SLAC Natl Accelerator Lab, 2575 Sand Hill Rd, Menlo Pk, CA 94025 USA
[3] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
关键词
CHEMICAL UNIVERSE; VIRTUAL EXPLORATION; EXTRACTION; MOLECULES; NETWORKS; LANGUAGE; CLIDE;
D O I
10.1039/d1sc02957f
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Inputting molecules into chemistry software, such as quantum chemistry packages, currently requires domain expertise, expensive software and/or cumbersome procedures. Leveraging recent breakthroughs in machine learning, we develop ChemPix: an offline, hand-drawn hydrocarbon structure recognition tool designed to remove these barriers. A neural image captioning approach consisting of a convolutional neural network (CNN) encoder and a long short-term memory (LSTM) decoder learned a mapping from photographs of hand-drawn hydrocarbon structures to machine-readable SMILES representations. We generated a large auxiliary training dataset, based on RDKit molecular images, by combining image augmentation, image degradation and background addition. Additionally, a small dataset of similar to 600 hand-drawn hydrocarbon chemical structures was crowd-sourced using a phone web application. These datasets were used to train the image-to-SMILES neural network with the goal of maximizing the hand-drawn hydrocarbon recognition accuracy. By forming a committee of the trained neural networks where each network casts one vote for the predicted molecule, we achieved a nearly 10 percentage point improvement of the molecule recognition accuracy and were able to assign a confidence value for the prediction based on the number of agreeing votes. The ensemble model achieved an accuracy of 76% on hand-drawn hydrocarbons, increasing to 86% if the top 3 predictions were considered.
引用
收藏
页码:10622 / 10633
页数:12
相关论文
共 50 条
  • [41] An Empirical Investigation of the Effectiveness of Optical Recognition of Hand-Drawn Business Process Elements by Applying Machine Learning
    Polancic, Gregor
    Jagecic, Slavica
    Kous, Katja
    [J]. IEEE ACCESS, 2020, 8 : 206118 - 206131
  • [42] An Online Hand-Drawn Electric Circuit Diagram Recognition System Using Hidden Markov Models
    Zhang, Yingmin
    Viard-Gaudin, Christian
    Wu, Liming
    [J]. ISISE 2008: INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING, VOL 2, 2008, : 143 - +
  • [43] Offline hand-drawn circuit component recognition using texture and shape-based features
    Roy, Soham
    Bhattacharya, Archan
    Sarkar, Navonil
    Malakar, Samir
    Sarkar, Ram
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (41-42) : 31353 - 31373
  • [44] Hand-drawn symbol recognition in graphic documents using deformable template matching and a bayesian framework
    Valveny, E
    Martí, E
    [J]. 15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS, 2000, : 239 - 242
  • [45] Offline hand-drawn circuit component recognition using texture and shape-based features
    Soham Roy
    Archan Bhattacharya
    Navonil Sarkar
    Samir Malakar
    Ram Sarkar
    [J]. Multimedia Tools and Applications, 2020, 79 : 31353 - 31373
  • [46] Graph Spectral Domain Feature Learning With Application to in-Air Hand-Drawn Number and Shape Recognition
    Alwaely, Basheer
    Abhayaratne, Charith
    [J]. IEEE ACCESS, 2019, 7 : 159661 - 159673
  • [47] Sketch2Vis: Generating Data Visualizations from Hand-drawn Sketches with Deep Learning
    Teng, Zhongwei
    Fu, Quchen
    White, Jules
    Schmidt, Douglas C.
    [J]. 20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 853 - 858
  • [48] Explainable deep learning approach for extracting cognitive features from hand-drawn images of intersecting pentagons
    Shinya Tasaki
    Namhee Kim
    Tim Truty
    Ada Zhang
    Aron S. Buchman
    Melissa Lamar
    David A. Bennett
    [J]. npj Digital Medicine, 6
  • [49] Data augmentation-assisted deep learning of hand-drawn partially colored sketches for visual search
    Ahmad, Jamil
    Muhammad, Khan
    Baik, Sung Wook
    [J]. PLOS ONE, 2017, 12 (08):
  • [50] Explainable deep learning approach for extracting cognitive features from hand-drawn images of intersecting pentagons
    Tasaki, Shinya
    Kim, Namhee
    Truty, Tim
    Zhang, Ada
    Buchman, Aron S.
    Lamar, Melissa
    Bennett, David A.
    [J]. NPJ DIGITAL MEDICINE, 2023, 6 (01)