Bloom filters for molecules

被引:0
|
作者
Jorge Medina
Andrew D. White
机构
[1] University of Rochester,Department of Chemical Engineering
来源
关键词
Bloom filter; Fingerprint; SMILES; Hashing;
D O I
暂无
中图分类号
学科分类号
摘要
Ultra-large chemical libraries are reaching 10s to 100s of billions of molecules. A challenge for these libraries is to efficiently check if a proposed molecule is present. Here we propose and study Bloom filters for testing if a molecule is present in a set using either string or fingerprint representations. Bloom filters are small enough to hold billions of molecules in just a few GB of memory and check membership in sub milliseconds. We found string representations can have a false positive rate below 1% and require significantly less storage than using fingerprints. Canonical SMILES with Bloom filters with the simple FNV (Fowler-Noll-Voll) hashing function provide fast and accurate membership tests with small memory requirements. We provide a general implementation and specific filters for detecting if a molecule is purchasable, patented, or a natural product according to existing databases at https://github.com/whitead/molbloom.
引用
收藏
相关论文
共 50 条
  • [1] Bloom filters for molecules
    Medina, Jorge
    White, Andrew D.
    [J]. JOURNAL OF CHEMINFORMATICS, 2023, 15 (01)
  • [2] BLOOM FILTERS
    WELLS, B
    [J]. DR DOBBS JOURNAL, 1995, 20 (01): : 12 - 12
  • [3] Incremental bloom filters
    Hao, Fang
    Kodialam, Murali
    Lakshman, T. V.
    [J]. 27TH IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (INFOCOM), VOLS 1-5, 2008, : 1741 - +
  • [4] The Dynamic Bloom Filters
    Guo, Deke
    Wu, Jie
    Chen, Honghui
    Yuan, Ye
    Luo, Xueshan
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (01) : 120 - 133
  • [5] On the analysis of Bloom filters
    Grandi, Fabio
    [J]. INFORMATION PROCESSING LETTERS, 2018, 129 : 35 - 39
  • [6] Scalable Bloom Filters
    Almeida, Paulo Sergio
    Baquero, Carlos
    Preguica, Nuno
    Hutchison, David
    [J]. INFORMATION PROCESSING LETTERS, 2007, 101 (06) : 255 - 261
  • [7] Multiple Bloom filters
    Yang, Yuanhang
    Chen, Shuhui
    [J]. PROCEEDINGS OF 2017 VI INTERNATIONAL CONFERENCE ON NETWORK, COMMUNICATION AND COMPUTING (ICNCC 2017), 2017, : 59 - 63
  • [8] Sliding Bloom Filters
    Naor, Moni
    Yogev, Eylon
    [J]. ALGORITHMS AND COMPUTATION, 2013, 8283 : 513 - 523
  • [9] Compressed bloom filters
    Mitzenmacher, M
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2002, 10 (05) : 604 - 612
  • [10] Bloom Filters in Adversarial Environments
    Naor, Moni
    Eylon, Yogev
    [J]. ACM TRANSACTIONS ON ALGORITHMS, 2019, 15 (03)