SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing

被引:34
|
作者
Baig, Furqan [1 ]
Hoang Vo [1 ]
Kurc, Tahsin [1 ]
Saltz, Joel [1 ]
Wang, Fusheng [1 ]
机构
[1] SUNY Stony Brook, Stony Brook, NY 11794 USA
关键词
Spatial processing; MapReduce; Spark; In-Memory processing; SYSTEM;
D O I
10.1145/3139958.3140019
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Much effort has been devoted to support high performance spatial queries on large volumes of spatial data in distributed spatial computing systems, especially in the MapReduce paradigm. Recent works have focused on extending spatial MapReduce frameworks to leverage high performance in-memory distributed processing capabilities of systems such as Spark. However, the performance advantage comes with the requirement of having enough memory and comprehensive configuration. Failing to fulfill this falls back to disk IO, defeating the purpose of such systems or in worst case gets out of memory and fails the job. The problem is aggravated further for spatial processing since the underlying in-memory systems are oblivious of spatial data features and characteristics. In this paper we present SparkGIS - an in-memory oriented spatial data querying system for high throughput and low latency spatial query handling by adapting Apache Spark's distributed processing capabilities. It supports basic spatial queries including containment, spatial join and k-nearest neighbor and allows extending these to complex query pipelines. SparkGIS mitigates skew in distributed processing by supporting several dynamic partitioning algorithms suitable for a rich set of contemporary application scenarios. Multi-level global and local, pre-generated and on-demand in-memory indexes, allow SparkGIS to prune input data and apply compute intensive operations on a subset of relevant spatial objects only. Finally, SparkGIS employs dynamic query rewriting to gracefully manage large spatial query workflows that exceed available distributed resources. Our comparative evaluation has shown that the performance of SparkGIS is on par with contemporary Spark based platforms for relatively smaller queries and outperforms them for larger data and memory intensive workflows by dynamic query rewriting and efficient spatial data management.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Efficient In-Memory Point Cloud Query Processing
    Teuscher, Balthasar
    Geissendoerfer, Oliver
    Luo, Xuanshu
    Li, Hao
    Anders, Katharina
    Holst, Christoph
    Werner, Martin
    [J]. RECENT ADVANCES IN 3D GEOINFORMATION SCIENCE, 3D GEOINFO 2023, 2024, : 267 - 286
  • [2] LocationSpark: In-memory Distributed Spatial Query Processing and Optimization
    Tang, Mingjie
    Yu, Yongyang
    Mahmood, Ahmed R.
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    Aref, Walid G.
    [J]. FRONTIERS IN BIG DATA, 2020, 3
  • [3] Compression-Aware In-Memory Query Processing: Vision, System Design and Beyond
    Hildebrandt, Juliana
    Habich, Dirk
    Damme, Patrick
    Lehner, Wolfgang
    [J]. DATA MANAGEMENT ON NEW HARDWARE, 2017, 10195 : 40 - 56
  • [4] The Art of Efficient In-memory Query Processing on NUMA Systems: a Systematic Approach
    Memarzia, Puya
    Ray, Suprio
    Bhavsar, Virendra C.
    [J]. 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 781 - 792
  • [5] Exploiting location-aware social networks for efficient spatial query processing
    Liang Tang
    Haiquan Chen
    Wei-Shinn Ku
    Min-Te Sun
    [J]. GeoInformatica, 2017, 21 : 33 - 55
  • [6] Simba: Efficient In-Memory Spatial Analytics
    Xie, Dong
    Li, Feifei
    Yao, Bin
    Li, Gefei
    Zhou, Liang
    Guo, Minyi
    [J]. SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 1071 - 1085
  • [7] Efficient In-Memory Processing Using Spintronics
    Chowdhury, Zamshed
    Harms, Jonathan D.
    Khatamifard, S. Karen
    Zabihi, Masoud
    Lv, Yang
    Lyle, Andrew P.
    Sapatnekar, Sachin S.
    Karpuzcu, Ulya R.
    Wang, Jian-Ping
    [J]. IEEE COMPUTER ARCHITECTURE LETTERS, 2018, 17 (01) : 42 - 46
  • [8] Exploiting location-aware social networks for efficient spatial query processing
    Tang, Liang
    Chen, Haiquan
    Ku, Wei-Shinn
    Sun, Min-Te
    [J]. GEOINFORMATICA, 2017, 21 (01) : 33 - 55
  • [9] In-memory Spatial-Aware Framework for Processing Proximity-Alike Queries in Big Spatial Data
    Al Jawarneh, Isam Mashhour
    Bellavista, Paolo
    Corradi, Antonio
    Foschini, Luca
    Montanari, Rebecca
    Zanotti, Andrea
    [J]. 2018 IEEE 23RD INTERNATIONAL WORKSHOP ON COMPUTER AIDED MODELING AND DESIGN OF COMMUNICATION LINKS AND NETWORKS (CAMAD), 2018, : 86 - 91
  • [10] In-Memory Database Query
    Giannopoulos, Iason
    Singh, Abhairaj
    Le Gallo, Manuel
    Jonnalagadda, Vara Prasad
    Hamdioui, Said
    Sebastian, Abu
    [J]. ADVANCED INTELLIGENT SYSTEMS, 2020, 2 (12)