The role of classifiers and data complexity in learned Bloom filters: insights and recommendations

被引:1
|
作者
Malchiodi, Dario [1 ,2 ]
Raimondi, Davide [1 ]
Fumagalli, Giacomo [1 ]
Giancarlo, Raffaele [3 ]
Frasca, Marco [1 ]
机构
[1] Univ Milan, Dept Comp Sci, Via Celoria 18, I-20133 Milan, Italy
[2] Univ Rome, CINI Natl Lab Artificial Intelligence & Intelligen, I-00185 Rome, Italy
[3] Univ Palermo, Dept Math & CS, Via Archirafi 34, I-90123 Palermo, Italy
关键词
Bloom filters; Learned Bloom filters; Approximate set membership; Dataset complexity; COMPETENCE; DOMAINS;
D O I
10.1186/s40537-024-00906-9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Bloom filters, since their introduction over 50 years ago, have become a pillar to handle membership queries in small space, with relevant application in Big Data Mining and Stream Processing. Further improvements have been recently proposed with the use of Machine Learning techniques: learned Bloom filters. Those latter make considerably more complicated the proper parameter setting of this multi-criteria data structure, in particular in regard to the choice of one of its key components (the classifier) and accounting for the classification complexity of the input dataset. Given this State of the Art, our contributions are as follows. (1) A novel methodology, supported by software, for designing, analyzing and implementing learned Bloom filters that account for their own multi-criteria nature, in particular concerning classifier type choice and data classification complexity. Extensive experiments show the validity of the proposed methodology and, being our software public, we offer a valid tool to the practitioners interested in using learned Bloom filters. (2) Further contributions to the advancement of the State of the Art that are of great practical relevance are the following: (a) the classifier inference time should not be taken as a proxy for the filter reject time; (b) of the many classifiers we have considered, only two offer good performance; this result is in agreement with and further strengthens early findings in the literature; (c) Sandwiched Bloom filter, which is already known as being one of the references of this area, is further shown here to have the remarkable property of robustness to data complexity and classifier performance variability.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] The role of classifiers and data complexity in learned Bloom filters: insights and recommendations
    Dario Malchiodi
    Davide Raimondi
    Giacomo Fumagalli
    Raffaele Giancarlo
    Marco Frasca
    Journal of Big Data, 11
  • [2] Stable Learned Bloom Filters for Data Streams
    Liu, Qiyu
    Zheng, Libin
    Shen, Yanyan
    Chen, Lei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (11): : 2355 - 2367
  • [3] On the Choice of General Purpose Classifiers in Learned Bloom Filters: An Initial Analysis Within Basic Filters
    Fumagalli, Giacomo
    Raimondi, Davide
    Giancarlo, Raffaele
    Malchiodi, Dario
    Frasca, Marco
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 675 - 682
  • [4] Optimizing Learned Bloom Filters: How Much Should Be Learned?
    Dai, Zhenwei
    Shrivastava, Anshumali
    Reviriego, Pedro
    Alberto Hernandez, Jose
    IEEE EMBEDDED SYSTEMS LETTERS, 2022, 14 (03) : 123 - 126
  • [5] A Model for Learned Bloom Filters, and Optimizing by Sandwiching
    Mitzenmacher, Michael
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [6] A short survey on Role of Bloom Filters in Named Data Networking
    Kaur, Ravneet
    Singh, Amritpal
    Batra, Shalini
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 247 - 252
  • [7] Adaptive Learned Bloom Filters under Incremental Workloads
    Bhattacharya, Arindam
    Bedathur, Srikanta
    Bagchi, Amitabha
    PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020), 2020, : 107 - 115
  • [8] A Critical Analysis of Classifier Selection in Learned Bloom Filters: The Essentials
    Malchiodi, Dario
    Raimondi, Davide
    Fumagalli, Giacomo
    Giancarlo, Raffaele
    Frasca, Marco
    24TH INTERNATIONAL CONFERENCE ON ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EAAAI/EANN 2023, 2023, 1826 : 47 - 61
  • [9] A Learned Prefix Bloom Filter for Spatial Data
    Zou, Beiji
    Zeng, Meng
    Zhu, Chengzhang
    Xiao, Ling
    Chen, Zhi
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2022, PT I, 2022, 13426 : 336 - 350
  • [10] An ensemble of filters and classifiers for microarray data classification
    Bolon-Canedo, V.
    Sanchez-Marono, N.
    Alonso-Betanzos, A.
    PATTERN RECOGNITION, 2012, 45 (01) : 531 - 539