Benchmark on Indexing Algorithms for Accelerating Molecular Similarity Search

被引:3
|
作者
Zhu, Chun Jiang [1 ]
Song, Minghu [2 ]
Liu, Qinqing [1 ]
Becquey, Chloe [1 ]
Bi, Jinbo [1 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA
[2] Univ Connecticut, Dept Biomed Engn, Storrs, CT 06269 USA
关键词
Query processing - Graphic methods - Indexing (of information) - Economic and social effects - Benchmarking - C++ (programming language);
D O I
10.1021/acs.jcim.0c00393
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Structurally similar analogues of given query compounds can be rapidly retrieved from chemical databases by the molecular similarity search approaches. However, the computational cost associated with the exhaustive similarity search of a large compound database will be quite high. Although the latest indexing algorithms can greatly speed up the search process, they cannot be readily applicable to molecular similarity search problems due to the lack of Tanimoto similarity metric implementation. In this paper, we first implement Python or C++ codes to enable the Tanimoto similarity search via several recent indexing algorithms, such as Hnsw and Onng. Moreover, there are increasing interests in computational communities to develop robust benchmarking systems to access the performance of various computational algorithms. Here, we provide a benchmark to evaluate the molecular similarity searching performance of these recent indexing algorithms. To avoid the potential package dependency issues, two separate benchmarks are built based on currently popular container technologies, Docker and Singularity. The Singularity container is a rather new container framework specifically designed for the high-performance computing (HPC) platform and does not need the privileged permissions or the separated daemon process. Both benchmarking methods are extensible to incorporate other new indexing algorithms, benchmarking data sets, and different customized parameter settings. Our results demonstrate that the graph-based methods, such as Hnsw and Onng, consistently achieve the best trade-off between searching effectiveness and searching efficiencies. The source code of the entire benchmark systems can be downloaded from https://github.uconn.edu/mldrugdiscovery/MssBenchmark.
引用
收藏
页码:6167 / 6184
页数:18
相关论文
共 50 条
  • [21] Efficiently Indexing Large Sparse Graphs for Similarity Search
    Wang, Guoren
    Wang, Bin
    Yang, Xiaochun
    Yu, Ge
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (03) : 440 - 451
  • [22] On effective conceptual indexing and similarity search in text data
    Aggarwal, CC
    Yu, PS
    2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 3 - 10
  • [23] Indexing large metric spaces for similarity search queries
    Bozkaya, T
    Ozsoyoglu, M
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 1999, 24 (03): : 361 - 404
  • [24] A novel indexing scheme for similarity search in metric spaces
    Tosun, Umut
    PATTERN RECOGNITION LETTERS, 2015, 54 : 69 - 74
  • [25] A Benchmark Dataset for Evaluating Process Similarity Search Methods
    Wang, Jiaxing
    Cao, Bin
    An, Weishi
    Fan, Jing
    Yin, Jianwei
    2017 IEEE 24TH INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS 2017), 2017, : 914 - 917
  • [26] Accelerating Large-Scale Molecular Similarity Search through Exploiting High Performance Computing
    Zhu, Chun Jiang
    Zhu, Tan
    Li, Haining
    Bi, Jinbo
    Song, Minghu
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 330 - 333
  • [27] An efficient similarity search based on indexing in large DNA databases
    Jeong, In-Seon
    Park, Kyoung-Wook
    Kang, Seung-Ho
    Lim, Hyeong-Seok
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2010, 34 (02) : 131 - 136
  • [28] Hierarchical indexing structure for efficient similarity search in video retrieval
    Lu, Hong
    Ooi, Beng Chin
    Shen, Heng Tao
    Xue, Xiangyang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (11) : 1544 - 1559
  • [29] ESPI Image Indexing and Similarity Search in Radon Transform Domain
    Vieux, R.
    Benois-Pineau, J.
    Domenger, J-P.
    Braquelaire, A.
    CBMI: 2009 INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING, 2009, : 231 - 236
  • [30] Efficient algorithm for sequence similarity search based on reference indexing
    Dai D.-B.
    Xiong Y.
    Zhu Y.-Y.
    Ruan Jian Xue Bao/Journal of Software, 2010, 21 (04): : 718 - 731