A Large-Scale Database for Chemical Structure Recognition and Preliminary Evaluation

被引:1
|
作者
Ding, Longfei [1 ,2 ]
Zhao, Mengbiao [2 ,3 ]
Yin, Fei [2 ,3 ]
Zeng, Shuiling [1 ]
Liu, Cheng-Lin [2 ,3 ]
机构
[1] Jishou Univ, Sch Informat Sci & Engn, Jishou 416000, Peoples R China
[2] Chinese Acad Sci, Natl Lab Pattern Recognit NLPR, Inst Automat, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
关键词
Chemical Structure Recognition; Database; Image-to-Markup; CLIDE;
D O I
10.1109/ICPR56361.2022.9956654
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Chemical structure recognition (CSR), transforming chemical structure images into formulas in character strings (such as SMILES), is a challenging problem due to the complex 2D structures and relationships. For this research, there is not a database of sufficient scale and diversity for model design and fair evaluation. In this paper, we present a large-scale chemical structure database named CASIA-CSDB, containing 480,668 samples (images corresponding to SMILES strings). To construct the database, we select chemical structures from the ChEMBL, a well-known bioactive molecules database, and use the RDKit tool to generate images according to the chemical format SMILES strings. The selected structures represent the major types of chemical compounds covering eight weight partitions. We also select a subset of 97,309 samples of the database to form the Mini-CASIA-CSDB database. To provide a benchmark, we evaluate three state-of-the-art image-to-markup recognition methods on the database. The results demonstrate the challenge of the database. The database with its annotation is available at http://www.nlpr.ia.ac.cn/databases/CASIA-CSDB/index.html.
引用
收藏
页码:1464 / 1470
页数:7
相关论文
共 50 条
  • [41] Large-scale chromatin structure
    Belmont, AS
    [J]. GENOME STRUCTURE AND FUNCTION: FROM CHROMOSOMES CHARACTERIZATION TO GENES TECHNOLOGY, 1997, 31 : 261 - 278
  • [42] The large-scale structure of the Universe
    Volker Springel
    Carlos S. Frenk
    Simon D. M. White
    [J]. Nature, 2006, 440 : 1137 - 1144
  • [43] The large-scale structure of the Universe
    Myridis, N. E.
    [J]. CONTEMPORARY PHYSICS, 2021, 62 (01) : 66 - 67
  • [44] LARGE-SCALE STRUCTURE IN THE UNIVERSE
    DRESSLER, A
    [J]. FOURTEENTH TEXAS SYMPOSIUM ON RELATIVISTIC ASTROPHYSICS, 1989, 571 : 308 - 318
  • [45] LARGE-SCALE STRUCTURE IN THE UNIVERSE
    EFSTATHIOU, G
    [J]. PHYSICA SCRIPTA, 1991, T36 : 88 - 96
  • [46] LARGE-SCALE STRUCTURE IN THE UNIVERSE
    BANHATTI, DG
    [J]. CURRENT SCIENCE, 1993, 65 (11): : 827 - 835
  • [47] Voids in the large-scale structure
    El-Ad, H
    Piran, T
    [J]. ASTROPHYSICAL JOURNAL, 1997, 491 (02): : 421 - 435
  • [48] LARGE-SCALE STRUCTURE IN THE UNIVERSE
    KASHLINSKY, A
    JONES, BJT
    [J]. NATURE, 1991, 349 (6312) : 753 - 760
  • [49] MORPHOLOGY OF LARGE-SCALE STRUCTURE
    TULLY, RB
    [J]. LARGE-SCALE MOTIONS IN THE UNIVERSE : A VATICAN STUDY WEEK, 1988, : 71 - 77
  • [50] Voids in the Large-scale Structure
    [J]. Astrophys J, 1 (421):