A Compact In-memory Index for Managing Set Membership Queries on Streaming Data

被引:0
|
作者
Wang, Yong [1 ]
Yun, Xiaochun [2 ]
Wang, Shupeng [1 ]
Wang, Xi [1 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] CNCERT CC, Beijing, Peoples R China
关键词
Membership query; Bloom filter; Priority; Hit ratio;
D O I
10.1007/978-3-319-42553-5_8
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Membership query of dynamic sets is essential for applications which generate or process a continuous stream of data items. These applications often require to cache items dynamically and answer membership queries for duplicate detection on unbounded data streams. Three key challenges for the caching mechanism are the limited memory space, high precision requirement and different priority-levels related with items. In this paper, we propose a compact in-memory index, Bloom Filter Ring (BFR), which is more suitable for dynamic caching of items on unbounded data streams. We demonstrate the time complexity and precision of BFR in finite memory space, and theoretically prove that BFR has higher expectation of average capacity than Aging Bloom Filter, the current state of art. Furthermore, we propose Priority-aware BFR (PBFR) to support membership query scheme which takes into account priority levels of items. Experimental results show that our algorithms gain better performance in term of cache hit ratio and false negative rate.
引用
收藏
页码:88 / 98
页数:11
相关论文
共 21 条
  • [1] Importance-aware Bloom Filter for Managing Set Membership Queries on Streaming Data
    Bhoraskar, Ravi
    Gabale, Vijay
    Kulkarni, Purushottam
    Kulkarni, Dhananjay
    [J]. 2013 FIFTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORKS (COMSNETS), 2013,
  • [2] A Compact In-Memory Dictionary for RDF Data
    Bazoobandi, Hamid R.
    de Rooij, Steven
    Urbani, Jacopo
    ten Teije, Annette
    van Harmelen, Frank
    Bal, Henri
    [J]. SEMANTIC WEB: LATEST ADVANCES AND NEW DOMAINS, ESWC 2015, 2015, 9088 : 205 - 220
  • [3] A query index for continuous queries on RFID streaming data
    Park, Jaekwan
    Hong, Bonghee
    Ban, Chaehoon
    [J]. SCIENCE IN CHINA SERIES F-INFORMATION SCIENCES, 2008, 51 (12): : 2047 - 2061
  • [4] A query index for continuous queries on RFID streaming data
    Jaekwan Park
    Bonghee Hong
    Chaehoon Ban
    [J]. Science in China Series F: Information Sciences, 2008, 51 : 2047 - 2061
  • [5] A query index for continuous queries on RFID streaming data
    Jaekwan PARK
    Bonghee HONG
    Chaehoon BAN
    [J]. Science China(Information Sciences), 2008, (12) : 2047 - 2061
  • [6] SparkNN: A distributed in-memory data partitioning for KNN queries on big spatial data
    Al Aghbari, Zaher
    Ismail, Tasneem
    Kamel, Ibrahim
    [J]. Data Science Journal, 2020, 19 (01) : 1 - 14
  • [7] Efficient In-Memory Evaluation of Reachability Graph Pattern Queries on Data Graphs
    Wu, Xiaoying
    Theodoratos, Dimitri
    Skoutas, Dimitrios
    Lan, Michael
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT I, 2022, : 55 - 71
  • [8] Big data availability: Selective partial checkpointing for in-memory database queries
    Playfair, Daniel
    Trehan, Amitabh
    McLarnon, Barry
    Nikolopoulos, Dimitrios S.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2785 - 2794
  • [9] Using Data Clustering to Optimize Scatter Bitmap Index for Membership Queries
    Weahama, Weahason
    Vanichayobon, Sirirut
    Manfuekphan, Jarin
    [J]. 2009 INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING, PROCEEDINGS, 2009, : 174 - 178
  • [10] Wormhole: A Fast Ordered Index for In-memory Data Management
    Wu, Xingbo
    Ni, Fan
    Jiang, Song
    [J]. PROCEEDINGS OF THE FOURTEENTH EUROSYS CONFERENCE 2019 (EUROSYS '19), 2019,