Mining a chemical database for fragment co-occurrence:: Discovery of "chemical cliches"

被引:39
|
作者
Lameijer, EW
Kok, JN
Bäck, T
Ijzerman, AP
机构
[1] Leiden Univ, Leiden Amsterdam Ctr Drug Res, Div Med Chem, NL-2300 RA Leiden, Netherlands
[2] Leiden Univ, LIACS, NL-2333 CA Leiden, Netherlands
[3] NuTech Solut, D-44227 Dortmund, Germany
关键词
D O I
10.1021/ci050370c
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Nowadays millions of different compounds are known, their structures stored in electronic databases. Analysis of these data could yield valuable insights into the laws of chemistry and the habits of chemists. We have therefore explored the public database of the National Cancer Institute (> 250 000 compounds) by pattern searching. We split the molecules of this database into fragments to find out which fragments exist, how frequent they are, and whether the occurrence of one fragment in a molecule is related to the occurrence of another, nonoverlapping fragment. It turns out that some fragments and combinations of fragments are so frequent that they can be called "chemical cliches". We believe that the fragment data can give insight into the chemical space explored so far by synthesis. The lists of fragments and their (co-)occurrences can help create novel chemical compounds by (i) systematically listing the most popular and therefore most easily used substituents and ring systems for synthesizing new compounds, (ii) being an easily accessible repository for rarer fragments Suitable for lead compound optimization, and (iii) pointing out some of the yet unexplored parts of chemical space.
引用
收藏
页码:553 / 562
页数:10
相关论文
共 50 条
  • [31] Molecular fragment replacement approach to protein structure determination by chemical shift and dipolar homology database mining
    Kontaxis, G
    Delaglio, F
    Bax, A
    NUCLEAR MAGNETIC RESONANCE OF BIOLOGICAL MACROMOLECULES, PART C, 2005, 394 : 42 - +
  • [32] Fragment-based lead discovery: a chemical update
    Erlanson, Daniel A.
    CURRENT OPINION IN BIOTECHNOLOGY, 2006, 17 (06) : 643 - 652
  • [33] SeCo-LDA: Mining Service Co-occurrence Topics for Recommendation
    Gao, Zhenfeng
    Fan, Yushun
    Wu, Cheng
    Tan, Wei
    Zhang, Jia
    Ni, Yayu
    Bai, Bing
    Chen, Shuhui
    2016 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS), 2016, : 25 - 32
  • [34] The Trajectory of Scientific Discovery: Concept Co-Occurrence and Converging Semantic Distance
    Cohen, Trevor
    Schvaneveldt, Roger W.
    MEDINFO 2010, PTS I AND II, 2010, 160 : 661 - 665
  • [35] Spatiotemporal Indexing Techniques for Efficiently Mining Spatiotemporal Co-occurrence Patterns
    Aydin, Berkay
    Kempton, Dustin
    Akkineni, Vijay
    Gopavaram, Shaktidhar Reddy
    Pillai, Karthik Ganesan
    Angryk, Rafal
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014,
  • [36] Mining spatiotemporal co-occurrence patterns in non-relational databases
    Berkay Aydin
    Vijay Akkineni
    Rafal Angryk
    GeoInformatica, 2016, 20 : 801 - 828
  • [37] Co-occurrence pattern mining based on a biological approximation scoring matrix
    Guo, Dan
    Yuan, Ermao
    Hu, Xuegang
    Wu, Xindong
    PATTERN ANALYSIS AND APPLICATIONS, 2018, 21 (04) : 977 - 996
  • [38] Frequent pattern discovery based on co-occurrence frequent item tree
    Hemalatha, R
    Krishnan, A
    Senthamarai, C
    Hemamalini, R
    2005 INTERNATIONAL CONFERENCE ON INTELLIGENT SENSING AND INFORMATION PROCESSING, PROCEEDINGS, 2005, : 348 - 354
  • [39] Co-occurrence analysis for discovery of novel breast cancer pathology patterns
    Maskery, Susan M.
    Zhang, Yonghong
    Jordan, Rick M.
    Hu, Hai
    Hooke, Jeffrey A.
    Shriver, Craig D.
    Liebman, Michael N.
    IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2006, 10 (03): : 497 - 503
  • [40] Co-occurrence pattern mining based on a biological approximation scoring matrix
    Dan Guo
    Ermao Yuan
    Xuegang Hu
    Xindong Wu
    Pattern Analysis and Applications, 2018, 21 : 977 - 996