A Large-Scale Database for Chemical Structure Recognition and Preliminary Evaluation

被引:1
|
作者
Ding, Longfei [1 ,2 ]
Zhao, Mengbiao [2 ,3 ]
Yin, Fei [2 ,3 ]
Zeng, Shuiling [1 ]
Liu, Cheng-Lin [2 ,3 ]
机构
[1] Jishou Univ, Sch Informat Sci & Engn, Jishou 416000, Peoples R China
[2] Chinese Acad Sci, Natl Lab Pattern Recognit NLPR, Inst Automat, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
关键词
Chemical Structure Recognition; Database; Image-to-Markup; CLIDE;
D O I
10.1109/ICPR56361.2022.9956654
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Chemical structure recognition (CSR), transforming chemical structure images into formulas in character strings (such as SMILES), is a challenging problem due to the complex 2D structures and relationships. For this research, there is not a database of sufficient scale and diversity for model design and fair evaluation. In this paper, we present a large-scale chemical structure database named CASIA-CSDB, containing 480,668 samples (images corresponding to SMILES strings). To construct the database, we select chemical structures from the ChEMBL, a well-known bioactive molecules database, and use the RDKit tool to generate images according to the chemical format SMILES strings. The selected structures represent the major types of chemical compounds covering eight weight partitions. We also select a subset of 97,309 samples of the database to form the Mini-CASIA-CSDB database. To provide a benchmark, we evaluate three state-of-the-art image-to-markup recognition methods on the database. The results demonstrate the challenge of the database. The database with its annotation is available at http://www.nlpr.ia.ac.cn/databases/CASIA-CSDB/index.html.
引用
收藏
页码:1464 / 1470
页数:7
相关论文
共 50 条
  • [1] Boosting face recognition on a large-scale database
    Lu, J
    Plataniotis, KN
    [J]. 2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL II, PROCEEDINGS, 2002, : 109 - 112
  • [2] A Large-scale Database for Less Cooperative Iris Recognition
    Hu, Junxing
    Wang, Leyuan
    Luo, Zhengquan
    Wang, Yunlong
    Sun, Zhenan
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB 2021), 2021,
  • [3] Encrypted Video Recognition in Large-scale Fingerprint Database
    Wu H.
    Yu Z.-H.
    Cheng G.
    Hu X.-Y.
    [J]. Ruan Jian Xue Bao/Journal of Software, 2021, 32 (10): : 3310 - 3330
  • [4] EFFICIENT DATABASE PRUNING FOR LARGE-SCALE COVER SONG RECOGNITION
    Osmalskyj, J.
    Pierard, S.
    Van Droogenbroeck, M.
    Embrechts, J. J.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 714 - 718
  • [5] SUN Database: Large-scale Scene Recognition from Abbey to Zoo
    Xiao, Jianxiong
    Hays, James
    Ehinger, Krista A.
    Oliva, Aude
    Torralba, Antonio
    [J]. 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 3485 - 3492
  • [6] HEU Emotion: a large-scale database for multimodal emotion recognition in the wild
    Jing Chen
    Chenhui Wang
    Kejun Wang
    Chaoqun Yin
    Cong Zhao
    Tao Xu
    Xinyi Zhang
    Ziqiang Huang
    Meichen Liu
    Tao Yang
    [J]. Neural Computing and Applications, 2021, 33 : 8669 - 8685
  • [7] HEU Emotion: a large-scale database for multimodal emotion recognition in the wild
    Chen, Jing
    Wang, Chenhui
    Wang, Kejun
    Yin, Chaoqun
    Zhao, Cong
    Xu, Tao
    Zhang, Xinyi
    Huang, Ziqiang
    Liu, Meichen
    Yang, Tao
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (14): : 8669 - 8685
  • [8] Parallel database processing/data mining on large-scale ATM connected PC cluster: Preliminary performance evaluation
    Kitsuregawa, M.
    Tamura, T.
    Oguchi, M.
    [J]. 1998, IASTED, Calgary, Canada (01):
  • [9] Parallel database processing/data mining on large-scale ATM connected PC cluster: Preliminary performance evaluation
    Kitsuregawa, M.
    Tamura, T.
    Oguchi, M.
    [J]. International Journal of Parallel and Distributed Systems & Networks, 1 (02): : 108 - 114
  • [10] Two-Stage Sparse Representation for Robust Recognition on Large-Scale Database
    He, Ran
    Hu, BaoGang
    Zheng, Wei-Shi
    Guo, YanQing
    [J]. PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 475 - 480