Benchmark on Indexing Algorithms for Accelerating Molecular Similarity Search

被引:3
|
作者
Zhu, Chun Jiang [1 ]
Song, Minghu [2 ]
Liu, Qinqing [1 ]
Becquey, Chloe [1 ]
Bi, Jinbo [1 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA
[2] Univ Connecticut, Dept Biomed Engn, Storrs, CT 06269 USA
关键词
Query processing - Graphic methods - Indexing (of information) - Economic and social effects - Benchmarking - C++ (programming language);
D O I
10.1021/acs.jcim.0c00393
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Structurally similar analogues of given query compounds can be rapidly retrieved from chemical databases by the molecular similarity search approaches. However, the computational cost associated with the exhaustive similarity search of a large compound database will be quite high. Although the latest indexing algorithms can greatly speed up the search process, they cannot be readily applicable to molecular similarity search problems due to the lack of Tanimoto similarity metric implementation. In this paper, we first implement Python or C++ codes to enable the Tanimoto similarity search via several recent indexing algorithms, such as Hnsw and Onng. Moreover, there are increasing interests in computational communities to develop robust benchmarking systems to access the performance of various computational algorithms. Here, we provide a benchmark to evaluate the molecular similarity searching performance of these recent indexing algorithms. To avoid the potential package dependency issues, two separate benchmarks are built based on currently popular container technologies, Docker and Singularity. The Singularity container is a rather new container framework specifically designed for the high-performance computing (HPC) platform and does not need the privileged permissions or the separated daemon process. Both benchmarking methods are extensible to incorporate other new indexing algorithms, benchmarking data sets, and different customized parameter settings. Our results demonstrate that the graph-based methods, such as Hnsw and Onng, consistently achieve the best trade-off between searching effectiveness and searching efficiencies. The source code of the entire benchmark systems can be downloaded from https://github.uconn.edu/mldrugdiscovery/MssBenchmark.
引用
收藏
页码:6167 / 6184
页数:18
相关论文
共 50 条
  • [1] Similarity indexing: Algorithms and performance
    White, DA
    Jain, R
    STORAGE AND RETRIEVAL FOR STILL IMAGE AND VIDEO DATABASES IV, 1996, 2670 : 62 - 73
  • [2] Efficient Metric Indexing for Similarity Search and Similarity Joins
    Chen, Lu
    Gao, Yunjun
    Li, Xinhan
    Jensen, Christian S.
    Chen, Gang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (03) : 556 - 571
  • [3] Efficient Metric Indexing for Similarity Search
    Chen, Lu
    Gao, Yunjun
    Li, Xinhan
    Jensen, Christian S.
    Chen, Gang
    2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2015, : 591 - 602
  • [4] Metric Indexing for Graph Similarity Search
    Bause, Franka
    Blumenthal, David B.
    Schubert, Erich
    Kriege, Nils M.
    SIMILARITY SEARCH AND APPLICATIONS, SISAP 2021, 2021, 13058 : 323 - 336
  • [5] Automatic Indexing for Similarity Search in ELKI
    Schubert, Erich
    SIMILARITY SEARCH AND APPLICATIONS (SISAP 2022), 2022, 13590 : 205 - 213
  • [6] Indexing Metric Spaces for Exact Similarity Search
    Chen, Lu
    Gao, Yunjun
    Song, Xuan
    Li, Zheng
    Zhu, Yifan
    Miao, Xiaoye
    Jensen, Christian S.
    ACM COMPUTING SURVEYS, 2023, 55 (06)
  • [7] Indexing schemes for similarity search: an illustrated paradigm
    Pestov, Vladimir
    Stojmirovic, Aleksandar
    FUNDAMENTA INFORMATICAE, 2006, 70 (04) : 367 - 385
  • [8] Optimal neighborhood indexing for protein similarity search
    Pierre Peterlongo
    Laurent Noé
    Dominique Lavenier
    Van Hoa Nguyen
    Gregory Kucherov
    Mathieu Giraud
    BMC Bioinformatics, 9
  • [9] Optimal neighborhood indexing for protein similarity search
    Peterlongo, Pierre
    Noe, Laurent
    Lavenier, Dominique
    Nguyen, Van Hoa
    Kucherov, Gregory
    Giraud, Mathieu
    BMC BIOINFORMATICS, 2008, 9 (1)
  • [10] Indexing of plasma waveforms for accelerating search and retrieval of their subsequences
    Hochin, Teruhisa
    Yamauchi, Yoshihiro
    Nakanishi, Hideya
    Kojima, Mamoru
    Nomiya, Hiroki
    FUSION ENGINEERING AND DESIGN, 2010, 85 (05) : 649 - 654