Large-Scale Comparison of Alternative Similarity Search Strategies with Varying Chemical Information Contents

被引:4
|
作者
Laufkoetter, Oliver [1 ]
Miyao, Tomoyuki [2 ,3 ]
Bajorath, Juergen [1 ]
机构
[1] Rheinische Friedrich Wilhelms Univ, Dept Life Sci Informat, Chem Biol & Med Chem, LIMES Program Unit,B IT, Endenicher Allee 19c, D-53115 Bonn, Germany
[2] Nara Inst Sci & Technol, Data Sci Ctr, 8916-5 Takayama Cho, Ikoma, Nara 6300192, Japan
[3] Nara Inst Sci & Technol, Grad Sch Sci & Technol, 8916-5 Takayama Cho, Ikoma, Nara 6300192, Japan
来源
ACS OMEGA | 2019年 / 4卷 / 12期
关键词
FINGERPRINTS; INCREASES; FEATURES;
D O I
10.1021/acsomega.9b02470
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Similarity searching (SS) is a core approach in computational compound screening and has a long tradition in pharmaceutical research. Over the years, different approaches have been introduced to increase the information content of search calculations and optimize the ability to detect compounds having similar activity. We present a large-scale comparison of distinct search strategies on more than 600 qualifying compound activity classes. Challenging test cases for SS were identified and used to evaluate different ways to further improve search performance, which provided a differentiated view of alternative search strategies and their relative performance. It was found that search results could not only be improved by increasing compound input information but also by focusing similarity calculations on database compounds. In the presence of multiple active reference compounds, asymmetric SS with high weights on chemical features of target compounds emerged as an overall preferred approach across many different activity classes. These findings have implications for practical virtual screening applications.
引用
收藏
页码:15304 / 15311
页数:8
相关论文
共 50 条
  • [41] Large-Scale Text Similarity Computing with Spark
    Bao, Xiaoan
    Dai, Shichao
    Zhang, Na
    Yu, Chenghai
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (04): : 95 - 100
  • [42] Linguistics in large-scale Web search
    Gulla, JA
    Auran, PG
    Risvik, KM
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2002, 2553 : 218 - 222
  • [43] COMPARISON OF RECRUITMENT STRATEGIES FOR A LARGE-SCALE CLINICAL-TRIAL IN THE ELDERLY
    SILAGY, CA
    CAMPION, K
    MCNEIL, JJ
    WORSAM, B
    DONNAN, GA
    TONKIN, AM
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 1991, 44 (10) : 1105 - 1114
  • [44] Uniting Keypoints: Local Visual Information Fusion for Large-Scale Image Search
    Liu, Zhen
    Li, Houqiang
    Zhou, Wengang
    Hong, Richang
    Tian, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (04) : 538 - 548
  • [45] Reliability of a distributed search engine for fresh information retrieval in large-scale Intranet
    Sato, N
    Udagawa, M
    Uehara, M
    Sakai, Y
    PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2003, 2745 : 14 - 27
  • [46] Large-scale data analysis of bioactivity information in PubChem using 2D and 3D chemical similarity
    Bolton, Evan
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2011, 242
  • [47] Label Consistent Matrix Factorization Hashing for Large-Scale Cross-Modal Similarity Search
    Wang, Di
    Gao, Xinbo
    Wang, Xiumei
    He, Lihuo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (10) : 2466 - 2479
  • [48] Distributed high-dimensional similarity search approach for large-scale wireless sensor networks
    Hu, Haifeng
    He, Jiefang
    Wu, Jianshen
    Wang, Kun
    Zhuang, Wei
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2017, 13 (03):
  • [49] Assessment of sampling strategies utilizing auxiliary information in large-scale forest inventory
    Raty, Minna
    Heikkinen, Juha
    Kangas, Annika
    CANADIAN JOURNAL OF FOREST RESEARCH, 2018, 48 (07) : 749 - 757
  • [50] Large-Scale Cooperative Dissemination of Governmental Information in Emergency - An Experiment and Future Strategies
    Horiba, Katsuhiro
    Okawa, Keiko
    Murai, Jun
    IEICE TRANSACTIONS ON COMMUNICATIONS, 2012, E95B (07) : 2191 - 2199