Mining a chemical database for fragment co-occurrence:: Discovery of "chemical cliches"

被引:39
|
作者
Lameijer, EW
Kok, JN
Bäck, T
Ijzerman, AP
机构
[1] Leiden Univ, Leiden Amsterdam Ctr Drug Res, Div Med Chem, NL-2300 RA Leiden, Netherlands
[2] Leiden Univ, LIACS, NL-2333 CA Leiden, Netherlands
[3] NuTech Solut, D-44227 Dortmund, Germany
关键词
D O I
10.1021/ci050370c
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Nowadays millions of different compounds are known, their structures stored in electronic databases. Analysis of these data could yield valuable insights into the laws of chemistry and the habits of chemists. We have therefore explored the public database of the National Cancer Institute (> 250 000 compounds) by pattern searching. We split the molecules of this database into fragments to find out which fragments exist, how frequent they are, and whether the occurrence of one fragment in a molecule is related to the occurrence of another, nonoverlapping fragment. It turns out that some fragments and combinations of fragments are so frequent that they can be called "chemical cliches". We believe that the fragment data can give insight into the chemical space explored so far by synthesis. The lists of fragments and their (co-)occurrences can help create novel chemical compounds by (i) systematically listing the most popular and therefore most easily used substituents and ring systems for synthesizing new compounds, (ii) being an easily accessible repository for rarer fragments Suitable for lead compound optimization, and (iii) pointing out some of the yet unexplored parts of chemical space.
引用
收藏
页码:553 / 562
页数:10
相关论文
共 50 条
  • [41] Mining top-k co-occurrence items with sequential pattern
    Tung Kieu
    Bay Vo
    Tuong Le
    Deng, Zhi-Hong
    Bac Le
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 85 : 123 - 133
  • [42] Discovery of online game user relationship based on co-occurrence of words
    Thawonmas, Ruck
    Konno, Yuki
    Tsuda, Kohei
    ENTERTAINMENT COMPUTING - ICEC 2006, 2006, 4161 : 286 - +
  • [43] Implicit Feature Identification via Co-occurrence Association Rule Mining
    Hai, Zhen
    Chang, Kuiyu
    Kim, Jung-jae
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT I, 2011, 6608 : 393 - 404
  • [44] Mining spatiotemporal co-occurrence patterns in non-relational databases
    Aydin, Berkay
    Akkineni, Vijay
    Angryk, Rafal
    GEOINFORMATICA, 2016, 20 (04) : 801 - 828
  • [45] Mining over a Reliable Evidential Database: Application on amphiphilic chemical database
    Samet, Ahmed
    Tien-Tuan Dao
    2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 1257 - 1262
  • [46] Mining the chemical quarry with joint chemical probes: An application of latent semantic structure indexing (LaSSI) and TOPOSIM (dice) to chemical database mining
    Singh, SB
    Sheridan, RP
    Fluder, EM
    Hull, RD
    JOURNAL OF MEDICINAL CHEMISTRY, 2001, 44 (10) : 1564 - 1575
  • [47] State agency policy and program coordination in response to the co-occurrence of HIV, chemical dependency, and mental illness
    Meyerson, B
    Chu, BC
    Mills, MV
    PUBLIC HEALTH REPORTS, 2003, 118 (05) : 408 - 414
  • [48] ChemMine. A compound mining database for chemical genomics
    Girke, T
    Cheng, LC
    Raikhel, N
    PLANT PHYSIOLOGY, 2005, 138 (02) : 573 - 577
  • [49] Exploration of chemical space for drug discovery by database generation
    Reymond, Jean-Louis
    Nguyen, Kong Thong
    CHIMICA OGGI-CHEMISTRY TODAY, 2009, 27 (02) : 37 - 40
  • [50] De Novo Fragment Design for Drug Discovery and Chemical Biology
    Rodrigues, Tiago
    Reker, Daniel
    Welin, Martin
    Caldera, Michael
    Brunner, Cyrill
    Gabernet, Gisela
    Schneider, Petra
    Walse, Bjoern
    Schneider, Gisbert
    ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2015, 54 (50) : 15079 - 15083