Comparison of the NCI open database with seven large chemical structural databases

被引:176
|
作者
Voigt, JH
Bienfait, B
Wang, SM
Nicklaus, MC
机构
[1] NCI, Med Chem Lab, Canc Res Ctr, NIH, Frederick, MD 21702 USA
[2] Georgetown Univ, Med Ctr, Struct Biol & Canc Drug Discovery Program, Lombardi Canc Ctr, Washington, DC 20007 USA
[3] Georgetown Univ, Med Ctr, Dept Oncol, Washington, DC 20007 USA
[4] Georgetown Univ, Med Ctr, Dept Neurosci, Washington, DC 20007 USA
关键词
D O I
10.1021/ci000150t
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Eight large chemical databases have been analyzed and compared to each other. Central to this comparison is the open National Cancer Institute (NCI) database, consisting of approximately 250 000 structures. The other databases analyzed are the Available Chemicals Directory ("ACD," from MDL, release 1.99, 3D-version); the ChemACX ("ACX," from CamSoft, Version 4.5); the Maybridge Catalog and the Asinex database (both as distributed by CamSoft as part of ChemInfo 4.5); the Sigma-Aldrich Catalog (CD-ROM, 1999 Version); the World Drug Index ("WDI," Derwent, version 1999.03): and the organic part of the Cambridge Crystallographic Database ("CSD," from Cambridge Crystallographic Data Center, 1999 Version 5.18). The database properties analyzed are internal duplication rates; compounds unique to each database; cumulative occurrence of compounds in an increasing number of databases, overlap of identical compounds between two databases: similarity overlap: diversity; and others. The crystallographic database CSD and the WDI show somewhat less overlap with the other databases than those with each other. In particular the collections of commercial compounds and compilations of vendor catalogs have a substantial degree of overlap among each other. Still, no database is completely a subset of any other, and each appears to have its own niche and thus "raison d'etre". The NCI database has by far the highest number of compounds that are unique to it. Approximately 200 000 of the NCI structures were not found in any of the other analyzed databases.
引用
收藏
页码:702 / 712
页数:11
相关论文
共 50 条
  • [41] A Compendium of Chemical Class and Use Type Open Access Databases
    Heinemann, Niklas
    Bub, Sascha
    Wolfram, Jakob
    Stehle, Sebastian
    Petschick, Lara L.
    Schulz, Ralf
    DATA, 2020, 5 (04) : 1 - 16
  • [42] Comparison of seven types of thermo-chemical pretreatments on the structural features and anaerobic digestion of sunflower stalks
    Monlau, F.
    Barakat, A.
    Steyer, J. P.
    Carrere, H.
    BIORESOURCE TECHNOLOGY, 2012, 120 : 241 - 247
  • [43] Applications of the Cambridge Structural Database in chemical education
    Battle, Gary M.
    Ferrence, Gregory M.
    Allen, Frank H.
    JOURNAL OF APPLIED CRYSTALLOGRAPHY, 2010, 43 : 1208 - 1223
  • [44] Formalized Abstracts for a Structural Chemical Electronic Database
    Alfimov, M. V.
    Kochanova, N. N.
    Kruglova, N. A.
    Yakshin, V. V.
    RUSSIAN JOURNAL OF COORDINATION CHEMISTRY, 1997, 23 (07) : 520 - 522
  • [45] Sharing chemical data through a structural database
    Ward, Suzanna
    Sarjeant, Amy
    Bruno, Ian
    Lightfoot, Matthew
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 255
  • [46] Use of CTI Index for Perception of Duplicated Chemical Structures in Large Chemical Databases
    Petrov, Emil
    Stoyanov, Borislav
    Kochev, Nikolay
    Bangov, Ivan
    MATCH-COMMUNICATIONS IN MATHEMATICAL AND IN COMPUTER CHEMISTRY, 2014, 71 (03) : 645 - 656
  • [47] STR database from Argentina: Statistical comparison with other population databases
    Sala, A
    Penacino, G
    Iannucci, N
    Corach, D
    PROGRESS IN FORENSIC GENETICS 7, 1998, 1167 : 347 - 349
  • [48] Quality control of the chemical information in the Crystallography Open Database
    Merkys, A.
    Vaitkus, A.
    Sidlauskaite, E.
    Urbonaite, M.
    Grybauskas, A.
    Grazulis, S.
    ACTA CRYSTALLOGRAPHICA A-FOUNDATION AND ADVANCES, 2024, 80
  • [49] Call for a Public Open Database of All Chemical Reactions
    Baldi, Pierre
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (09) : 2011 - 2014
  • [50] Large Scale Unconstrained Open Set Face Database
    Sapkota, Archana
    Boult, Terrance E.
    2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON BIOMETRICS: THEORY, APPLICATIONS AND SYSTEMS (BTAS), 2013,