SIMILARITY SEARCHING AND CLUSTERING OF CHEMICAL-STRUCTURE DATABASES USING MOLECULAR PROPERTY DATA

被引:121
|
作者
DOWNS, GM
WILLETT, P
FISANICK, W
机构
[1] UNIV SHEFFIELD,DEPT INFORMAT STUDIES,SHEFFIELD S10 2TN,S YORKSHIRE,ENGLAND
[2] CHEM ABSTRACTS SERV INC,COLUMBUS,OH 43210
关键词
D O I
10.1021/ci00021a011
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Previous work on the clustering of chemical-structure databases has focused on the use of intermolecular similarity measures that are based on structural features of various kinds. In this paper, we report nearest-neighbor searching and clustering experiments with a set of 5982 molecules, each of which is characterized by 13 calculated global molecular properties. The nearest-neighbor algorithm is an upperbound procedure that uses the triangle inequality to minimize the number of distance calculations that need to be carried out when searching for nearest neighbors in metric spaces. Our experiments suggest that it performs well when small numbers of nearest neighbors are required, but that the basic ''brute-force'' procedure is best when large numbers are needed, such as when clustering is to be carried out. The clustering methods tested are the Ward and group-average hierarchic agglomerative methods, the minimum-diameter polythetic hierarchic divisive method, and the Jarvis-Patrick nearest-neighbor method. Our experiments suggest that the first three methods, which gave similar results, are the best methods for clustering molecules characterized by property data. The Jarvis-Patrick method, which has been extensively used for clustering molecules characterized by structural fragments, was not as effective as these other methods.
引用
收藏
页码:1094 / 1102
页数:9
相关论文
共 50 条
  • [1] The application of data fusion to similarity searching in chemical databases
    Ginn, CMR
    Ranada, SS
    Willett, P
    Bradshaw, J
    [J]. FUSION'98: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MULTISOURCE-MULTISENSOR INFORMATION FUSION, VOLS 1 AND 2, 1998, : 307 - 313
  • [2] Effect of Data Standardization on Chemical Clustering and Similarity Searching
    Chu, Chia-Wei
    Holliday, John D.
    Willett, Peter
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2009, 49 (02) : 155 - 161
  • [3] INORGANIC CHEMICAL-STRUCTURE SEARCHING
    RUSCH, PF
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1986, 192 : 29 - CNIF
  • [4] VERY LARGE CHEMICAL-STRUCTURE DATABASES - IMPLICATIONS IN MOLECULAR MODELING
    HARAKI, K
    VENKATARAGHAVAN, R
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1995, 210 : 56 - CINF
  • [5] AUTOMATIC IDENTIFICATION OF MOLECULAR SIMILARITY USING REDUCED-GRAPH REPRESENTATION OF CHEMICAL-STRUCTURE
    TAKAHASHI, Y
    SUKEKAWA, M
    SASAKI, S
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1992, 32 (06): : 639 - 643
  • [6] Searching molecular structure databases using tandem MS data: are we there yet?
    Boecker, Sebastian
    [J]. CURRENT OPINION IN CHEMICAL BIOLOGY, 2017, 36 : 1 - 6
  • [7] EFFICIENT DESIGN FOR CHEMICAL-STRUCTURE SEARCHING
    FELDMAN, AP
    HODES, L
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1974, : 17 - 17
  • [8] SEARCHING OF CHEMICAL-STRUCTURE DATA-BASES WITH PARALLEL COMPUTER HARDWARE
    RASMUSSEN, EM
    WILLETT, P
    WILSON, T
    MANSON, GA
    WILSON, GA
    [J]. ANALYTICA CHIMICA ACTA, 1990, 235 (01) : 77 - 86
  • [9] Shape-based similarity searching in chemical databases
    Finn, Paul W.
    Morris, Garrett M.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2013, 3 (03) : 226 - 241
  • [10] CHEMICAL-STRUCTURE SEARCHING - USING S4/MOLKICK ON DIALOG
    WELFORD, SM
    [J]. ACS SYMPOSIUM SERIES, 1990, 436 : 64 - 79