Large-Scale Comparison of Alternative Similarity Search Strategies with Varying Chemical Information Contents

被引:4
|
作者
Laufkoetter, Oliver [1 ]
Miyao, Tomoyuki [2 ,3 ]
Bajorath, Juergen [1 ]
机构
[1] Rheinische Friedrich Wilhelms Univ, Dept Life Sci Informat, Chem Biol & Med Chem, LIMES Program Unit,B IT, Endenicher Allee 19c, D-53115 Bonn, Germany
[2] Nara Inst Sci & Technol, Data Sci Ctr, 8916-5 Takayama Cho, Ikoma, Nara 6300192, Japan
[3] Nara Inst Sci & Technol, Grad Sch Sci & Technol, 8916-5 Takayama Cho, Ikoma, Nara 6300192, Japan
来源
ACS OMEGA | 2019年 / 4卷 / 12期
关键词
FINGERPRINTS; INCREASES; FEATURES;
D O I
10.1021/acsomega.9b02470
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Similarity searching (SS) is a core approach in computational compound screening and has a long tradition in pharmaceutical research. Over the years, different approaches have been introduced to increase the information content of search calculations and optimize the ability to detect compounds having similar activity. We present a large-scale comparison of distinct search strategies on more than 600 qualifying compound activity classes. Challenging test cases for SS were identified and used to evaluate different ways to further improve search performance, which provided a differentiated view of alternative search strategies and their relative performance. It was found that search results could not only be improved by increasing compound input information but also by focusing similarity calculations on database compounds. In the presence of multiple active reference compounds, asymmetric SS with high weights on chemical features of target compounds emerged as an overall preferred approach across many different activity classes. These findings have implications for practical virtual screening applications.
引用
收藏
页码:15304 / 15311
页数:8
相关论文
共 50 条
  • [1] Large-Scale Similarity Search with Optimal Transport
    Laouar, Clea
    Takezawa, Yuki
    Yamada, Makoto
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 11920 - 11930
  • [2] LARGE-SCALE BACTERIAL GENE DISCOVERY BY SIMILARITY SEARCH
    ROBISON, K
    GILBERT, W
    CHURCH, GM
    NATURE GENETICS, 1994, 7 (02) : 205 - 214
  • [3] Tree Quantization for Large-Scale Similarity Search and Classification
    Babenko, Artem
    Lempitsky, Victor
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 4240 - 4248
  • [4] Compact representation for large-scale clustering and similarity search
    Wang, Bin
    Chen, Yuanhao
    Lie, Zhiwei
    Lie, Mingjing
    Advances in Multimedia Information Processing - PCM 2006, Proceedings, 2006, 4261 : 835 - 843
  • [5] Large-scale Visual Search and Similarity for E-Commerce
    Anand, Gaurav
    Wang, Siyun
    Ni, Karl
    APPLICATIONS OF MACHINE LEARNING 2021, 2021, 11843
  • [6] Efficient Large-Scale Similarity Search Using Matrix Factorization
    Iscen, Ahmet
    Rabbat, Michael
    Furon, Teddy
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2073 - 2081
  • [7] Large-Scale Analysis of Email Search and Organizational Strategies
    Narang, Kanika
    Dumais, Susan T.
    Craswell, Nick
    Liebling, Dan
    Ai, Qingyao
    CHIIR'17: PROCEEDINGS OF THE 2017 CONFERENCE HUMAN INFORMATION INTERACTION AND RETRIEVAL, 2017, : 215 - 223
  • [8] A graph-based cache for large-scale similarity search engines
    Gil-Costa, Veronica
    Marin, Mauricio
    Bonacic, Carolina
    Solar, Roberto
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (05): : 2006 - 2034
  • [9] SEARCH FOR LARGE-SCALE COHERENT STRUCTURES IN THE SIMILARITY REGION OF A TURBULENT JET
    JSO, J
    KOVASZNAY, LSG
    HUSSAIN, AKMF
    BULLETIN OF THE AMERICAN PHYSICAL SOCIETY, 1979, 24 (08): : 1133 - 1133
  • [10] A graph-based cache for large-scale similarity search engines
    Veronica Gil-Costa
    Mauricio Marin
    Carolina Bonacic
    Roberto Solar
    The Journal of Supercomputing, 2018, 74 : 2006 - 2034