ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning

被引:20
|
作者
Weir, Hayley [1 ,2 ]
Thompson, Keiran [1 ,2 ]
Woodward, Amelia [1 ]
Choi, Benjamin [3 ]
Braun, Augustin [1 ]
Martinez, Todd J. [1 ,2 ]
机构
[1] Stanford Univ, Dept Chem, Stanford, CA 94305 USA
[2] SLAC Natl Accelerator Lab, 2575 Sand Hill Rd, Menlo Pk, CA 94025 USA
[3] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
关键词
CHEMICAL UNIVERSE; VIRTUAL EXPLORATION; EXTRACTION; MOLECULES; NETWORKS; LANGUAGE; CLIDE;
D O I
10.1039/d1sc02957f
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Inputting molecules into chemistry software, such as quantum chemistry packages, currently requires domain expertise, expensive software and/or cumbersome procedures. Leveraging recent breakthroughs in machine learning, we develop ChemPix: an offline, hand-drawn hydrocarbon structure recognition tool designed to remove these barriers. A neural image captioning approach consisting of a convolutional neural network (CNN) encoder and a long short-term memory (LSTM) decoder learned a mapping from photographs of hand-drawn hydrocarbon structures to machine-readable SMILES representations. We generated a large auxiliary training dataset, based on RDKit molecular images, by combining image augmentation, image degradation and background addition. Additionally, a small dataset of similar to 600 hand-drawn hydrocarbon chemical structures was crowd-sourced using a phone web application. These datasets were used to train the image-to-SMILES neural network with the goal of maximizing the hand-drawn hydrocarbon recognition accuracy. By forming a committee of the trained neural networks where each network casts one vote for the predicted molecule, we achieved a nearly 10 percentage point improvement of the molecule recognition accuracy and were able to assign a confidence value for the prediction based on the number of agreeing votes. The ensemble model achieved an accuracy of 76% on hand-drawn hydrocarbons, increasing to 86% if the top 3 predictions were considered.
引用
收藏
页码:10622 / 10633
页数:12
相关论文
共 50 条
  • [31] Deep Normal Estimation for Automatic Shading of Hand-Drawn Characters
    Hudon, Matis
    Grogan, Mairead
    Pages, Rafael
    Smolic, Aljosa
    [J]. COMPUTER VISION - ECCV 2018 WORKSHOPS, PT III, 2019, 11131 : 246 - 262
  • [32] Character animation creation using hand-drawn sketches
    Bing-Yu Chen
    Yutaka Ono
    Tomoyuki Nishita
    [J]. The Visual Computer, 2005, 21 : 551 - 558
  • [33] DiagramNet: Hand-Drawn Diagram Recognition Using Visual Arrow-Relation Detection
    Schaefer, Bernhard
    Stuckenschmidt, Heiner
    [J]. DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I, 2021, 12821 : 614 - 630
  • [34] Recognizing hand-drawn quadrilaterals using genetic algorithms
    Mota-Gutierrez, Sergio A.
    Ayala-Ramirez, Victor
    Sanchez-Yanez, Raul E.
    [J]. 2009 III CONFERENCE OF UNIVERSITY OF GUANAJUATO IEEE STUDENTS CHAPTER (IEEEXPO 2009), 2009, : 20 - 23
  • [35] Android GUI Search Using Hand-drawn Sketches
    Ge, Xiaofei
    [J]. 2019 IEEE/ACM 41ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2019), 2019, : 141 - 143
  • [36] Using a hand-drawn sketch to control a team of robots
    Skubic, Marjorie
    Anderson, Derek
    Blisard, Samuel
    Perzanowski, Dennis
    Schultz, Alan
    [J]. AUTONOMOUS ROBOTS, 2007, 22 (04) : 399 - 410
  • [37] Using a hand-drawn sketch to control a team of robots
    Marjorie Skubic
    Derek Anderson
    Samuel Blisard
    Dennis Perzanowski
    Alan Schultz
    [J]. Autonomous Robots, 2007, 22 : 399 - 410
  • [38] Character animation creation using hand-drawn sketches
    Chen, BY
    Ono, Y
    Nishita, T
    [J]. VISUAL COMPUTER, 2005, 21 (8-10): : 551 - 558
  • [39] Learning to Infer Graphics Programs from Hand-Drawn Images
    Ellis, Kevin
    Ritchie, Daniel
    Solar-Lezama, Armando
    Tenenbaum, Joshua B.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [40] Early Parkinson's Disease Diagnosis through Hand-Drawn Spiral and Wave Analysis Using Deep Learning Techniques
    Huang, Yingcong
    Chaturvedi, Kunal
    Nayan, Al-Akhir
    Hesamian, Mohammad Hesam
    Braytee, Ali
    Prasad, Mukesh
    [J]. INFORMATION, 2024, 15 (04)