A distributed B+Tree indexing method for processing range queries over streaming data

被引:0
|
作者
Shahab Safaee
Meghdad Mirabi
Amir Masoud Rahmani
Ali Asghar Safaei
机构
[1] Islamic Azad University,Department of Computer Engineering, Faculty of Engineering, South Tehran Branch
[2] National Yunlin University of Science and Technology,Future Technology Research Center
[3] Tarbiat Modares University,Department of Medical Informatics, Faculty of Medical Sciences
来源
Cluster Computing | 2024年 / 27卷
关键词
B+Tree index; Distributed query processing; Map-Reduce model; Range query; Streaming data;
D O I
暂无
中图分类号
学科分类号
摘要
A data stream exhibits as a massive unbounded sequence of data elements continuously generated at a high rate. Stream databases raise new challenges for query processing due to both the streaming nature of data which constantly changes over time and the wider range of queries submitted by the user when compared with the traditional databases. In this paper, we propose a system architecture which includes components for both distributed indexing of streaming data and distributed processing of range queries on streaming data. Instead of creating a large and centralized B+Tree index structure, we create a set of small B+Tree indexes in such a way that a B+Tree index can be created for every partition of streaming data. We also design a distributed range search algorithm which can be used by each individual machine inside a Spark cluster to independently process range queries on each partition of streaming data. By exploiting the proposed system architecture, the process of indexing of streaming data and the process of querying over streaming data can be performed in a distributed and parallel manner. By performing several experiments, we demonstrate that our proposed indexing method is scalable and efficient for processing range queries on streaming data compared to the existing centralized B+Tree indexing methods and therefore, it can be used for applications involving data streams with a large volume of data elements and a large number of range queries.
引用
收藏
页码:1251 / 1274
页数:23
相关论文
共 50 条
  • [41] Waterwheel: Realtime Indexing and Temporal Range Query Processing over Massive Data Streams
    Wang, Li
    Cai, Ruichu
    Fu, Tom Z. J.
    He, Jiong
    Lu, Zijie
    Winslett, Marianne
    Zhang, Zhenjie
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 269 - 280
  • [42] Robust Distributed Query Processing for Streaming Data
    Lei, Chuan
    Rundensteiner, Elke A.
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2014, 39 (02):
  • [43] Distributed Processing of Approximate Range Queries in Wireless Sensor Networks
    Hu, Haifeng
    He, Jiefang
    Wu, Jiansheng
    2015 SEVENTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2015, : 45 - 51
  • [44] Fjording the stream: An architecture for queries over streaming sensor data
    Madden, S
    Franklin, MJ
    18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 555 - 566
  • [45] Pre-processing and Indexing Techniques for Constellation Queries in Big Data
    Khatibi, Amir
    Porto, Fabio
    Rittmeyer, Joao Guilherme
    Ogasawara, Eduardo
    Valduriez, Patrick
    Shasha, Dennis
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2017, 2017, 10440 : 164 - 172
  • [46] Processing Analytical Queries over Encrypted Data
    Tu, Stephen
    Kaashoek, M. Frans
    Madden, Samuel
    Zeldovich, Nickolai
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (05): : 289 - 300
  • [47] Efficient Indexing Multiple Multidimensional Continuous Queries over Data Stream
    Hou, Dongfeng
    Liu, Qingbao
    Lu, Changhui
    Zhang, Weiming
    PROCEEDINGS OF 2010 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (ICCSIT 2010), VOL 6, 2010, : 594 - 598
  • [48] Processing Regular Path Queries on Arbitrarily Distributed Data
    Davoust, Alan
    Esfandiari, Babak
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2016 CONFERENCES, 2016, 10033 : 844 - 861
  • [49] Cost-based solution for optimizing multi-join queries over distributed streaming sensor data
    Gomes, Joseph
    Choi, Hyeong-Ah
    2006 INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, 2006, : 282 - +
  • [50] Monadic queries over tree-structured data
    Gottlob, G
    Koch, C
    17TH ANNUAL IEEE SYMPOSIUM ON LOGIC IN COMPUTER SCIENCE, PROCEEDINGS, 2002, : 189 - 202