Large-Scale Similarity Search Profiling of ChEMBL Compound Data Sets

被引:64
|
作者
Heikamp, Kathrin [1 ]
Bajorath, Juergen [1 ]
机构
[1] Univ Bonn, Dept Life Sci Informat, B IT, LIMES Program Unit Chem Biol & Med Chem, D-53113 Bonn, Germany
关键词
FINGERPRINTS; RECOMBINATION;
D O I
10.1021/ci200199u
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
A large-scale similarity search investigation has been carried out on 266 well-defined compound activity classes extracted from the ChEMBL database. The analysis was performed using two widely applied two-dimensional (2D) fingerprints that mark opposite ends of the current performance spectrum of these types of fingerprints, i.e., MACCS structural keys and the extended connectivity fingerprint with bond diameter four (ECFP4). For each fingerprint, three nearest neighbor search strategies were applied. On the basis of these search calculations, a similarity search profile of the ChEMBL database was generated. Overall, the fingerprint search campaign was surprisingly successful. In 203 of 266 test cases (similar to 76%), a compound recovery rate of at least 50% was observed with at least the better performing fingerprint and one search strategy. The similarity search profile also revealed several general trends. For example, fingerprint searching was often characterized by an early enrichment of active compounds in database selection sets. In addition, compound activity classes have been categorized according to different similarity search performance levels, which helps to put the results of benchmark calculations into perspective. Therefore, a compendium of activity classes falling into different search performance categories is provided. On the basis of our large-scale investigation, the performance range of state-of-the-art 2D fingerprinting has been delineated for compound data sets directed against a wide spectrum of pharmaceutical targets.
引用
收藏
页码:1831 / 1839
页数:9
相关论文
共 50 条
  • [31] DISVMs: Fast SVMs Training on Large-scale Data Sets
    Cui, Lijuan
    Wang, Changjian
    Li, Ziyang
    Peng, Yuxing
    2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 967 - 971
  • [32] WebLens: Towards Interactive Large-scale Structured Data Profiling
    Khan, Rituparna
    Gubanov, Michael
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 3425 - 3428
  • [33] Large-scale parallel similarity search with Product Quantization for online multimedia services
    Andrade, Guilherme
    Fernandes, Andre
    Gomes, Jeremias M.
    Ferreira, Renato
    Teodoro, George
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2019, 125 : 81 - 92
  • [34] The influence of image descriptors' dimensions' value cardinalities on large-scale similarity search
    Semertzidis, Theodoros
    Rafailidis, Dimitrios
    Strintzis, Michael Gerassimos
    Daras, Petros
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2015, 4 (03) : 187 - 204
  • [35] TLCSim: A Large-Scale Two-Level Clustering Similarity Search with MapReduce
    Trong Nhan Phan
    Jager, Markus
    Nadschlager, Stefan
    Gomez-Perez, Pablo
    Huber, Christian
    Kung, Josef
    Cong An Nguyen
    FUTURE DATA AND SECURITY ENGINEERING, FDSE 2016, 2016, 10018 : 53 - 71
  • [36] Scalable Similarity Search in Seismology: A New Approach to Large-Scale Earthquake Detection
    Bergen, Karianne
    Yoon, Clara
    Beroza, Gregory C.
    SIMILARITY SEARCH AND APPLICATIONS, SISAP 2016, 2016, 9939 : 301 - 308
  • [37] Scalable similarity search in seismology: A new approach to large-scale earthquake detection
    Bergen, Karianne (kbergen@stanford.edu), 1600, Springer Verlag (9939 LNCS):
  • [38] Large-scale comparison of machine learning methods for drug target prediction on ChEMBL
    Mayr, Andreas
    Klambauer, Guenter
    Unterthiner, Thomas
    Steijaert, Marvin
    Wegner, Jorg K.
    Ceulemans, Hugo
    Clevert, Djork-Arne
    Hochreiter, Sepp
    CHEMICAL SCIENCE, 2018, 9 (24) : 5441 - 5451
  • [39] Similarity Estimation for Large-Scale Human Action Video Data on Spark
    Xu, Weihua
    Uddin, Md Azher
    Dolgorsuren, Batjargal
    Akhond, Mostafijur Rahman
    Khan, Kifayat Ullah
    Hossain, Md Ibrahim
    Lee, Young-Koo
    APPLIED SCIENCES-BASEL, 2018, 8 (05):
  • [40] An adaptive clustering algorithm by neighbourhood search for large-scale data
    Sevinc, Busra
    Gurler, Selma
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2023, 93 (01) : 175 - 187