Benchmark on Indexing Algorithms for Accelerating Molecular Similarity Search

被引:3
|
作者
Zhu, Chun Jiang [1 ]
Song, Minghu [2 ]
Liu, Qinqing [1 ]
Becquey, Chloe [1 ]
Bi, Jinbo [1 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA
[2] Univ Connecticut, Dept Biomed Engn, Storrs, CT 06269 USA
关键词
Query processing - Graphic methods - Indexing (of information) - Economic and social effects - Benchmarking - C++ (programming language);
D O I
10.1021/acs.jcim.0c00393
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Structurally similar analogues of given query compounds can be rapidly retrieved from chemical databases by the molecular similarity search approaches. However, the computational cost associated with the exhaustive similarity search of a large compound database will be quite high. Although the latest indexing algorithms can greatly speed up the search process, they cannot be readily applicable to molecular similarity search problems due to the lack of Tanimoto similarity metric implementation. In this paper, we first implement Python or C++ codes to enable the Tanimoto similarity search via several recent indexing algorithms, such as Hnsw and Onng. Moreover, there are increasing interests in computational communities to develop robust benchmarking systems to access the performance of various computational algorithms. Here, we provide a benchmark to evaluate the molecular similarity searching performance of these recent indexing algorithms. To avoid the potential package dependency issues, two separate benchmarks are built based on currently popular container technologies, Docker and Singularity. The Singularity container is a rather new container framework specifically designed for the high-performance computing (HPC) platform and does not need the privileged permissions or the separated daemon process. Both benchmarking methods are extensible to incorporate other new indexing algorithms, benchmarking data sets, and different customized parameter settings. Our results demonstrate that the graph-based methods, such as Hnsw and Onng, consistently achieve the best trade-off between searching effectiveness and searching efficiencies. The source code of the entire benchmark systems can be downloaded from https://github.uconn.edu/mldrugdiscovery/MssBenchmark.
引用
收藏
页码:6167 / 6184
页数:18
相关论文
共 50 条
  • [41] Neural skyline filter for accelerating skyline search algorithms
    Chen, Yi-Chung
    Lee, Chiang
    EXPERT SYSTEMS, 2015, 32 (01) : 108 - 131
  • [42] Indexing scheme for fast similarity search in large time series databases
    Keogh, Eamonn J.
    Pazzani, Michael J.
    Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM, 1999, : 56 - 67
  • [43] A novel indexing approach for efficient and fast similarity search of captured motions
    Li, Chuanjun
    Prabhakaran, B.
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 689 - 698
  • [44] A Tree-Based Indexing Approach for Diverse Textual Similarity Search
    Yu, Minghe
    Chai, Chengliang
    Yu, Ge
    IEEE ACCESS, 2021, 9 : 8866 - 8876
  • [45] Indexing expensive functions for efficient multi-dimensional similarity search
    Hanxiong Chen
    Jianquan Liu
    Kazutaka Furuse
    Jeffrey Xu Yu
    Nobuo Ohbo
    Knowledge and Information Systems, 2011, 27 : 165 - 192
  • [46] Indexing expensive functions for efficient multi-dimensional similarity search
    Chen, Hanxiong
    Liu, Jianquan
    Furuse, Kazutaka
    Yu, Jeffrey Xu
    Ohbo, Nobuo
    KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 27 (02) : 165 - 192
  • [47] An Efficient Document Indexing-Based Similarity Search in Large Datasets
    Trong Nhan Phan
    Jaeger, Markus
    Nadschlaeger, Stefan
    Kueng, Josef
    Tran Khanh Dang
    FUTURE DATA AND SECURITY ENGINEERING, FDSE 2015, 2015, 9446 : 16 - 31
  • [48] Local Similarity Search on Geolocated Time Series Using Hybrid Indexing
    Chatzigeorgakidis, Georgios
    Skoutas, Dimitrios
    Patroumpas, Kostas
    Palpanas, Themis
    Athanasiou, Spiros
    Skiadopoulos, Spiros
    27TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2019), 2019, : 179 - 188
  • [49] Similarity Search in Graph Databases: A Multi-layered Indexing Approach
    Liang, Yongjiang
    Zhao, Peixiang
    2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 783 - 794
  • [50] Scalable Similarity Search for Molecular Descriptors
    Tabei, Yasuo
    Puglisi, Simon J.
    SIMILARITY SEARCH AND APPLICATIONS, SISAP 2017, 2017, 10609 : 207 - 219