Similarity-based data mining in files of two-dimensional chemical structures using fingerprint measures of molecular resemblance

被引:20
|
作者
Willett, Peter [1 ]
机构
[1] Univ Sheffield, Informat Sch, Sheffield, S Yorkshire, England
关键词
COMBINATORIAL LIBRARIES; CLUSTERING METHODS; COMPOUND SELECTION; GROUP FUSION; DIVERSITY; DISSIMILARITY; PROPERTY; OPTIMIZATION; DESCRIPTORS; ALGORITHMS;
D O I
10.1002/widm.26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper reviews the use of measures of intermolecular similarity for processing databases of chemical structures, which play an important role in the discovery of new drugs by the pharmaceutical industry. The similarity measures considered here are based on the use of a fingerprint representation of molecular structure, where a fingerprint is a vector encoding the presence of fragment substructures in a molecule and where the similarity between pairs of such fingerprints is computed using an association coefficient such as the Tanimoto coefficient. The Similar Property Principle provides the basic rationale for the use of similarity methods in three important chemoinformatics applications-similarity searching, database clustering, and molecular diversity analysis. Similarity searching enables the identification of those molecules in a database that are most similar to a user-defined, biologically active query molecule, with data fusion providing an effective way of combining the results of multiple similarity searches. Cluster analysis, typically using the Jarvis-Patrick, Ward, or divisive k-means clustering methods, enables the cost-effective selection of molecules for biological testing, for property prediction and for investigating database overlap. Molecular diversity analysis, typically using cluster-based, dissimilarity-based, or optimization-based approaches, enables the identification of structurally diverse sets of molecules, so as to ensure that the full chemical space spanned by a database is tested in the search for novel bioactive molecules. (C) 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 241-251 DOI: 10.1002/widm.26
引用
收藏
页码:241 / 251
页数:11
相关论文
共 50 条
  • [1] Similarity-based data mining in files of two-dimensional chemical structures using fingerprint measures of molecular resemblance
    Information School, University of Sheffield, Sheffield, United Kingdom
    Wiley Interdiscip. Rev. Data Min. Knowl. Discov., 3 (241-251):
  • [2] Clustering of chemical structures on the basis of two-dimensional similarity measures
    Barnard, J.M.
    Downs, G.M.
    Journal of Chemical Information and Computer Sciences, 1992, 32 (06):
  • [3] Molecular similarity using two-dimensional representations of structures
    Richards, WG
    Robinson, DD
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1997, 214 : 48 - CINF
  • [4] Similarity searching in files of three-dimensional chemical structures: Analysis of the BIOSTER database using two-dimensional fingerprints and molecular field descriptors
    Schuffenhauer, A
    Gillet, VJ
    Willett, P
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (02): : 295 - 307
  • [5] SIMILARITY SEARCHING IN FILES OF 3-DIMENSIONAL CHEMICAL STRUCTURES - COMPARISON OF FRAGMENT-BASED MEASURES OF SHAPE SIMILARITY
    BATH, PA
    POIRRETTE, AR
    WILLETT, P
    ALLEN, FH
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1994, 34 (01): : 141 - 147
  • [6] Similarity calculations using two-dimensional molecular representations
    Allen, BCP
    Grant, GH
    Richards, WG
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (02): : 330 - 337
  • [7] Maintenance of Nursing Care Plan using Similarity-based Data Mining Methodsa
    Iwata, Haruko
    Tsumoto, Shusaku
    Hirano, Shoji
    2013 ICME INTERNATIONAL CONFERENCE ON COMPLEX MEDICAL ENGINEERING (CME), 2013, : 97 - 102
  • [8] Weighted similarity-based clustering of chemical structures and bioactivity data in early drug discovery
    Perualila-Tan, Nolen Joy
    Shkedy, Ziv
    Talloen, Willem
    Gohlmann, Hinrich W. H.
    Van Moerbeke, Marijke
    Kasim, Adetayo
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2016, 14 (04)
  • [9] Accelerating Similarity-based Mining Tasks on High-dimensional Data by Processing-in-memory
    Wang, Fang
    Yiu, Man Lung
    Shao, Zili
    2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 1859 - 1864
  • [10] Similarity searching in files of three-dimensional chemical structures: Flexible field-based searching of molecular electrostatic potentials
    Thorner, DA
    Wild, DJ
    Willett, P
    Wright, PM
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1996, 36 (04): : 900 - 908