Molecular diversity in chemical databases: Comparison of medicinal chemistry knowledge bases and databases of commercially available compounds

被引:94
|
作者
Cummins, DJ [1 ]
Andrews, CW [1 ]
Bentley, JA [1 ]
Cory, M [1 ]
机构
[1] GLAXO WELLCOME,DIV INFORMAT TECHNOL,RES TRIANGLE PK,NC 27709
关键词
volume; factor analysis; outliers; hypercube; chemical descriptor; database mining;
D O I
10.1021/ci950168h
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
A molecular descriptor space has been developed which describes structural diversity. Large databases of molecules have been mapped into it and compared. This analysis used five chemical databases, CMC and MDDR, which represent knowledge bases containing active medicinal agents, ACD and SPECS, two databases of commercially available compounds, and finally the Wellcome Registry. Together these databases contained more than 300 000 structures. Topological indices and the free energy of solvation were computed for each compound in the databases. Factor analysis was used to reduce the dimensionality of the descriptor space. Low density observations were deleted as a way of removing outliers, which allowed a further reduction in the descriptor space of interest. The five databases could then be compared on an efficient basis using a metric developed for this purpose. A Riemann gridding scheme was used to subdivide the factor space into subhypercubes to obtain accurate comparisons. Most of the 300 000 structures were highly clustered, but unique structures were found. An analysis of overlap between the biological and commercial databases was carried out. The metric provides a useful algorithm for choosing screening sets of diverse compounds from large databases.
引用
收藏
页码:750 / 763
页数:14
相关论文
共 20 条
  • [1] Molecular diversity in chemical databases: Comparison of medicinal chemistry knowledge bases and databases of commercially available compounds.
    Cummins, DJ
    Andrews, CW
    Bentley, JA
    Cory, M
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1996, 211 : 24 - CINF
  • [2] Analysis of tautomerism in databases of commercially available compounds
    Guasch, Laura
    Sitzmann, Markus
    Nicklaus, Marc C.
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2013, 245
  • [3] An overview of the diversity represented in commercially-available databases
    Mary P. Bradley
    [J]. Molecular Diversity, 2000, 5 : 175 - 183
  • [4] An overview of the diversity represented in commercially-available databases
    Bradley, MP
    [J]. MOLECULAR DIVERSITY, 2000, 5 (04) : 175 - 183
  • [5] An overview of the diversity represented in commercially-available databases
    Mary P. Bradley
    [J]. Journal of Computer-Aided Molecular Design, 2002, 16 : 301 - 309
  • [6] An overview of the diversity represented in commercially-available databases
    Bradley, MP
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2002, 16 (5-6) : 301 - 309
  • [7] CHEMICAL SUBSTRUCTURE SEARCHING - COMPARING 3 COMMERCIALLY AVAILABLE DATABASES
    BENWAGNER, A
    [J]. ONLINE REVIEW, 1986, 10 (03): : 173 - 183
  • [8] Molecular diversity and representativity in chemical databases
    Bayada, DM
    Hamersma, H
    van Geerestein, VJ
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1999, 39 (01): : 1 - 10
  • [9] Molecular dataset diversity indices and their applications to comparison of chemical databases and QSAR analysis
    Golbraikh, A
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (02): : 414 - 425
  • [10] Comparison of chemical databases:: Analysis of molecular diversity with Self Organising Maps (SOM)
    Bernard, P
    Golbraikh, A
    Kireev, D
    Chrétien, JR
    Rozhkova, N
    [J]. ANALUSIS, 1998, 26 (08) : 333 - 341