Benchmark on Indexing Algorithms for Accelerating Molecular Similarity Search

被引:3
|
作者
Zhu, Chun Jiang [1 ]
Song, Minghu [2 ]
Liu, Qinqing [1 ]
Becquey, Chloe [1 ]
Bi, Jinbo [1 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA
[2] Univ Connecticut, Dept Biomed Engn, Storrs, CT 06269 USA
关键词
Query processing - Graphic methods - Indexing (of information) - Economic and social effects - Benchmarking - C++ (programming language);
D O I
10.1021/acs.jcim.0c00393
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Structurally similar analogues of given query compounds can be rapidly retrieved from chemical databases by the molecular similarity search approaches. However, the computational cost associated with the exhaustive similarity search of a large compound database will be quite high. Although the latest indexing algorithms can greatly speed up the search process, they cannot be readily applicable to molecular similarity search problems due to the lack of Tanimoto similarity metric implementation. In this paper, we first implement Python or C++ codes to enable the Tanimoto similarity search via several recent indexing algorithms, such as Hnsw and Onng. Moreover, there are increasing interests in computational communities to develop robust benchmarking systems to access the performance of various computational algorithms. Here, we provide a benchmark to evaluate the molecular similarity searching performance of these recent indexing algorithms. To avoid the potential package dependency issues, two separate benchmarks are built based on currently popular container technologies, Docker and Singularity. The Singularity container is a rather new container framework specifically designed for the high-performance computing (HPC) platform and does not need the privileged permissions or the separated daemon process. Both benchmarking methods are extensible to incorporate other new indexing algorithms, benchmarking data sets, and different customized parameter settings. Our results demonstrate that the graph-based methods, such as Hnsw and Onng, consistently achieve the best trade-off between searching effectiveness and searching efficiencies. The source code of the entire benchmark systems can be downloaded from https://github.uconn.edu/mldrugdiscovery/MssBenchmark.
引用
收藏
页码:6167 / 6184
页数:18
相关论文
共 50 条
  • [31] Indexing schemes for similarity search in datasets of short protein fragments
    Stojmirovic, Aleksandar
    Pestov, Vladimir
    INFORMATION SYSTEMS, 2007, 32 (08) : 1145 - 1165
  • [32] Effective indexing and filtering for similarity search in large biosequence databases
    Ozturk, O
    Ferhatosmanoglu, H
    THIRD IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING - BIBE 2003, PROCEEDINGS, 2003, : 359 - 366
  • [33] Multi Feature Indexing Network MUFIN for Similarity Search Applications
    Zezula, Pavel
    SOFSEM 2012: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2012, 7147 : 77 - 87
  • [34] Indexing of Motion Capture Data for Efficient and Fast Similarity Search
    Li, Chuanjun
    Prabhakaran, B.
    JOURNAL OF COMPUTERS, 2006, 1 (03) : 35 - 42
  • [35] Indexing Dense Nested Metric Spaces for Efficient Similarity Search
    Brisaboa, Nieves R.
    Luaces, Miguel R.
    Pedreira, Oscar
    Places, Angeles S.
    Seco, Diego
    PERSPECTIVES OF SYSTEMS INFORMATICS, 2010, 5947 : 98 - 109
  • [36] Similarity search of time series with moving average based indexing
    Lin, Zi-Yu
    Yang, Dong-Qing
    Wang, Teng-Jiao
    Ruan Jian Xue Bao/Journal of Software, 2008, 19 (09): : 2349 - 2361
  • [37] ProBench: a benchmark dataset for evaluating the process similarity search methods
    Cao B.
    Wang J.
    An W.
    Fan J.
    Cheng S.
    Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2017, 23 (05): : 1069 - 1079
  • [38] BLOX: Macro Neural Architecture Search Benchmark and Algorithms
    Chau, Thomas
    Dudziak, Lukasz
    Wen, Hongkai
    Lane, Nicholas D.
    Abdelfattah, Mohamed S.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [39] Accelerating Exact Similarity Search on CPU-GPU Systems
    Matsumoto, Takazumi
    Yiu, Man Lung
    2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2015, : 320 - 329
  • [40] Accelerating Graph Similarity Search via Efficient GED Computation
    Chang, Lijun
    Feng, Xing
    Yao, Kai
    Qin, Lu
    Zhang, Wenjie
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (05) : 4485 - 4498