ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning

被引：20

作者：

Weir, Hayley ^{[1
,2
]}

Thompson, Keiran ^{[1
,2
]}

Woodward, Amelia ^{[1
]}

Choi, Benjamin ^{[3
]}

Braun, Augustin ^{[1
]}

Martinez, Todd J. ^{[1
,2
]}

机构：

[1] Stanford Univ, Dept Chem, Stanford, CA 94305 USA

[2] SLAC Natl Accelerator Lab, 2575 Sand Hill Rd, Menlo Pk, CA 94025 USA

[3] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA

来源：

CHEMICAL SCIENCE | 2021年 / 12卷 / 31期

关键词：

CHEMICAL UNIVERSE; VIRTUAL EXPLORATION; EXTRACTION; MOLECULES; NETWORKS; LANGUAGE; CLIDE;

D O I：

10.1039/d1sc02957f

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Inputting molecules into chemistry software, such as quantum chemistry packages, currently requires domain expertise, expensive software and/or cumbersome procedures. Leveraging recent breakthroughs in machine learning, we develop ChemPix: an offline, hand-drawn hydrocarbon structure recognition tool designed to remove these barriers. A neural image captioning approach consisting of a convolutional neural network (CNN) encoder and a long short-term memory (LSTM) decoder learned a mapping from photographs of hand-drawn hydrocarbon structures to machine-readable SMILES representations. We generated a large auxiliary training dataset, based on RDKit molecular images, by combining image augmentation, image degradation and background addition. Additionally, a small dataset of similar to 600 hand-drawn hydrocarbon chemical structures was crowd-sourced using a phone web application. These datasets were used to train the image-to-SMILES neural network with the goal of maximizing the hand-drawn hydrocarbon recognition accuracy. By forming a committee of the trained neural networks where each network casts one vote for the predicted molecule, we achieved a nearly 10 percentage point improvement of the molecule recognition accuracy and were able to assign a confidence value for the prediction based on the number of agreeing votes. The ensemble model achieved an accuracy of 76% on hand-drawn hydrocarbons, increasing to 86% if the top 3 predictions were considered.

引用

页码：10622 / 10633

页数：12

共 50 条

[41] An Empirical Investigation of the Effectiveness of Optical Recognition of Hand-Drawn Business Process Elements by Applying Machine Learning
Polancic, Gregor
Jagecic, Slavica
Kous, Katja
[J]. IEEE ACCESS, 2020, 8 : 206118 - 206131
[42] An Online Hand-Drawn Electric Circuit Diagram Recognition System Using Hidden Markov Models
Zhang, Yingmin
Viard-Gaudin, Christian
Wu, Liming
[J]. ISISE 2008: INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING, VOL 2, 2008, : 143 - +
[43] Offline hand-drawn circuit component recognition using texture and shape-based features
Roy, Soham
Bhattacharya, Archan
Sarkar, Navonil
Malakar, Samir
Sarkar, Ram
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (41-42) : 31353 - 31373
[44] Hand-drawn symbol recognition in graphic documents using deformable template matching and a bayesian framework
Valveny, E
Martí, E
[J]. 15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS, 2000, : 239 - 242
[45] Offline hand-drawn circuit component recognition using texture and shape-based features
Soham Roy
Archan Bhattacharya
Navonil Sarkar
Samir Malakar
Ram Sarkar
[J]. Multimedia Tools and Applications, 2020, 79 : 31353 - 31373
[46] Graph Spectral Domain Feature Learning With Application to in-Air Hand-Drawn Number and Shape Recognition
Alwaely, Basheer
Abhayaratne, Charith
[J]. IEEE ACCESS, 2019, 7 : 159661 - 159673
[47] Sketch2Vis: Generating Data Visualizations from Hand-drawn Sketches with Deep Learning
Teng, Zhongwei
Fu, Quchen
White, Jules
Schmidt, Douglas C.
[J]. 20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 853 - 858
[48] Explainable deep learning approach for extracting cognitive features from hand-drawn images of intersecting pentagons
Shinya Tasaki
Namhee Kim
Tim Truty
Ada Zhang
Aron S. Buchman
Melissa Lamar
David A. Bennett
[J]. npj Digital Medicine, 6
[49] Data augmentation-assisted deep learning of hand-drawn partially colored sketches for visual search
Ahmad, Jamil
Muhammad, Khan
Baik, Sung Wook
[J]. PLOS ONE, 2017, 12 (08):
[50] Explainable deep learning approach for extracting cognitive features from hand-drawn images of intersecting pentagons
Tasaki, Shinya
Kim, Namhee
Truty, Tim
Zhang, Ada
Buchman, Aron S.
Lamar, Melissa
Bennett, David A.
[J]. NPJ DIGITAL MEDICINE, 2023, 6 (01)

← 1 2 3 4 5 →