LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data

被引:106
|
作者
Tang, Mingjie [1 ]
Yu, Yongyang [1 ]
Malluhi, Qutaibah M. [2 ]
Ouzzani, Mourad [3 ]
Aref, Walid G. [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
[2] Qatar Univ, Doha, Qatar
[3] HBKU, Qatar Comp Res Inst, Ar Rayyan, Qatar
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2016年 / 9卷 / 13期
基金
美国国家科学基金会;
关键词
D O I
10.14778/3007263.3007310
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present LocationSpark, a spatial data processing system built on top of Apache Spark, a widely used distributed data processing system. LocationSpark offers a rich set of spatial query operators, e.g., range search, kNN, spatio-textual operation, spatial-join, and kNN-join. To achieve high performance, LocationSpark employs various spatial indexes for in-memory data, and guarantees that immutable spatial indexes have low overhead with fault tolerance. In addition, we build two new layers over Spark, namely a query scheduler and a query executor. The query scheduler is responsible for mitigating skew in spatial queries, while the query executor selects the best plan based on the indexes and the nature of the spatial queries. Furthermore, to avoid unnecessary network communication overhead when processing overlapped spatial data, We embed an efficient spatial Bloom filter into LocationSpark's indexes. Finally, LocationSpark tracks frequently accessed spatial data, and dynamically flushes less frequently accessed data into disk. We evaluate our system on real workloads and demonstrate that it achieves an order of magnitude performance gain over a baseline framework.
引用
收藏
页码:1565 / 1568
页数:4
相关论文
共 50 条
  • [1] LocationSpark: In-memory Distributed Spatial Query Processing and Optimization
    Tang, Mingjie
    Yu, Yongyang
    Mahmood, Ahmed R.
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    Aref, Walid G.
    [J]. FRONTIERS IN BIG DATA, 2020, 3
  • [2] SparkNN: A distributed in-memory data partitioning for KNN queries on big spatial data
    Al Aghbari, Zaher
    Ismail, Tasneem
    Kamel, Ibrahim
    [J]. Data Science Journal, 2020, 19 (01) : 1 - 14
  • [3] Distributed In-Memory Analytics for Big Temporal Data
    Yao, Bin
    Zhang, Wei
    Wang, Zhi-Jie
    Chen, Zhongpu
    Shang, Shuo
    Zheng, Kai
    Guo, Minyi
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2018, PT I, 2018, 10827 : 549 - 565
  • [4] Simba: Spatial In-Memory Big Data Analysis
    Xie, Dong
    Li, Feifei
    Yao, Bin
    Li, Gefei
    Chen, Zhongpu
    Zhou, Liang
    Guo, Minyi
    [J]. 24TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2016), 2016,
  • [5] MemepiC: Towards a Unified In-Memory Big Data Management System
    Cai, Qingchao
    Zhang, Hao
    Guo, Wentian
    Chen, Gang
    Ooi, Beng Chin
    Tan, Kian-Lee
    Wong, Weng-Fai
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2019, 5 (01) : 4 - 17
  • [6] Distributed PARAFAC Decomposition Method Based on In-memory Big Data System
    Yang, Hye-Kyung
    Yong, Hwan-Seung
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2019, 11448 : 292 - 295
  • [7] In-Memory Big Data Management and Processing: A Survey
    Zhang, Hao
    Chen, Gang
    Ooi, Beng Chin
    Tan, Kian-Lee
    Zhang, Meihui
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (07) : 1920 - 1948
  • [8] Distributed In-memory Data Management for Workflow Executions
    Souza, Renan
    Silva, Vitor
    Lima, Alexandre A. B.
    de Oliveira, Daniel
    Valduriez, Patrick
    Mattoso, Marta
    [J]. PeerJ Computer Science, 2021, 7 : 1 - 30
  • [9] Distributed in-memory data management for workflow executions
    Souza, Renan
    Silva, Vitor
    Lima, Alexandre A. B.
    de Oliveira, Daniel
    Valduriez, Patrick
    Mattoso, Marta
    [J]. PEERJ COMPUTER SCIENCE, 2021,
  • [10] In-Memory Performance for Big Data
    Graefe, Goetz
    Volos, Haris
    Kimura, Hideaki
    Kuno, Harumi
    Tucek, Joseph
    Lillibridge, Mark
    Veitch, Alistair
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 8 (01): : 37 - 48