ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning

被引:20
|
作者
Weir, Hayley [1 ,2 ]
Thompson, Keiran [1 ,2 ]
Woodward, Amelia [1 ]
Choi, Benjamin [3 ]
Braun, Augustin [1 ]
Martinez, Todd J. [1 ,2 ]
机构
[1] Stanford Univ, Dept Chem, Stanford, CA 94305 USA
[2] SLAC Natl Accelerator Lab, 2575 Sand Hill Rd, Menlo Pk, CA 94025 USA
[3] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
关键词
CHEMICAL UNIVERSE; VIRTUAL EXPLORATION; EXTRACTION; MOLECULES; NETWORKS; LANGUAGE; CLIDE;
D O I
10.1039/d1sc02957f
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Inputting molecules into chemistry software, such as quantum chemistry packages, currently requires domain expertise, expensive software and/or cumbersome procedures. Leveraging recent breakthroughs in machine learning, we develop ChemPix: an offline, hand-drawn hydrocarbon structure recognition tool designed to remove these barriers. A neural image captioning approach consisting of a convolutional neural network (CNN) encoder and a long short-term memory (LSTM) decoder learned a mapping from photographs of hand-drawn hydrocarbon structures to machine-readable SMILES representations. We generated a large auxiliary training dataset, based on RDKit molecular images, by combining image augmentation, image degradation and background addition. Additionally, a small dataset of similar to 600 hand-drawn hydrocarbon chemical structures was crowd-sourced using a phone web application. These datasets were used to train the image-to-SMILES neural network with the goal of maximizing the hand-drawn hydrocarbon recognition accuracy. By forming a committee of the trained neural networks where each network casts one vote for the predicted molecule, we achieved a nearly 10 percentage point improvement of the molecule recognition accuracy and were able to assign a confidence value for the prediction based on the number of agreeing votes. The ensemble model achieved an accuracy of 76% on hand-drawn hydrocarbons, increasing to 86% if the top 3 predictions were considered.
引用
收藏
页码:10622 / 10633
页数:12
相关论文
共 50 条
  • [1] ChemReco: automated recognition of hand-drawn carbon-hydrogen-oxygen structures using deep learning
    Ouyang, Hengjie
    Liu, Wei
    Tao, Jiajun
    Luo, Yanghong
    Zhang, Wanjia
    Zhou, Jiayu
    Geng, Shuqi
    Zhang, Chengpeng
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):
  • [2] Hand-drawn electronic component recognition using deep learning algorithm
    Wang, Haiyan
    Pan, Tianhong
    Ahsan, Mian Khuram
    [J]. INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2020, 62 (01) : 13 - 19
  • [3] Hand-Drawn Symbol Recognition in Immersive Virtual Reality Using Deep Extreme Learning Machines
    Cecotti, Hubert
    Boumedine, Cyrus
    Callaghan, Michael
    [J]. RECENT TRENDS IN IMAGE PROCESSING AND PATTERN RECOGNITION (RTIP2R 2016), 2017, 709 : 80 - 92
  • [4] Learning and recognition of hand-drawn shapes using generative genetic programming
    Jaskowski, Wojciech
    Krawiec, Krzysztof
    Wieloch, Bartosz
    [J]. APPLICATIONS OF EVOLUTIONARY COMPUTING, PROCEEDINGS, 2007, 4448 : 281 - +
  • [5] Hand-Drawn Emoji Recognition using Convolutional Neural Network
    Akter, Mehenika
    Hossain, Mohammad Shahadat
    Andersson, Karl
    [J]. PROCEEDINGS OF 2020 6TH IEEE INTERNATIONAL WOMEN IN ENGINEERING (WIE) CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE 2020), 2020, : 159 - 164
  • [6] Hand-Drawn Shape Recognition Using the SVM'ed Kernel
    Refaat, Khaled S.
    Atiya, Amir F.
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2009, PT II, 2009, 5769 : 275 - 284
  • [7] Deep Learning based Hand-Drawn Molecular Structure Recognition and 3D Visualisation using Augmented Reality
    Adhikari, Jayampathi
    Aththanayake, Malith
    Kularathna, Charith
    Wijayasiri, Adeesha
    Munasinghe, Aravinda
    [J]. 2022 22ND INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER), 2022,
  • [8] Machine recognition of hand-drawn circuit diagrams
    Edwards, B
    Chandran, V
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 3618 - 3621
  • [9] Deep CNN-based Features for Hand-Drawn Sketch Recognition via Transfer Learning Approach
    Hayat, Shaukat
    She, Kun
    Yu, Yao
    Mateen, Muhammad
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (09) : 438 - 448
  • [10] Hand-Drawn Electrical Circuit Recognition Using Object Detection and Node Recognition
    Rachala R.R.
    Panicker M.R.
    [J]. SN Computer Science, 2022, 3 (3)