Incremental Partitioning for Efficient Spatial Data Analytics

被引:4
|
作者
Vu, Tin [1 ]
Eldawy, Ahmed [1 ]
Hristidis, Vagelis [1 ]
Tsotras, Vassilis [1 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2021年 / 15卷 / 03期
基金
美国国家科学基金会;
关键词
SEARCH-TREES;
D O I
10.14778/3494124.3494150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big spatial data has become ubiquitous, from mobile applications to satellite data. In most of these applications, data is continuously growing to huge volumes. Existing systems for big spatial data organize records at either the record-level or block-level. Systems that use record-level structures include key-value stores and LSM-Tree stores, which support insert and delete operations and they are optimized for highly-selective queries. On the other hand, systems like GeoSpark that use block-level structures (e.g. 128 MB each) are more efficient for analytical queries, but they cannot incrementally maintain the partitioned data and do not support delete operations. This paper proposes a general framework that enables block-level systems to incrementally maintain spatial partitions, in the presence of bulk insertions and deletions, in distributed file system (DES) blocks. We first formally study the incremental spatial partitioning problem for big data and demonstrate its NP-hardness. Then, we propose a cost model to estimate the performance of queries on the partitioned data and the effect of modifying it as the data grows. After that, we provide three different implementations of the incremental partitioning framework. Comprehensive experiments on large real datasets show that our proposed partitioning algorithms outperforms state-of-the-art spatial partitioning methods.
引用
收藏
页码:713 / 726
页数:14
相关论文
共 50 条
  • [1] Tinba: Incremental partitioning for efficient trajectory analytics
    Tian, Ruijie
    Zhang, Weishi
    Wang, Fei
    Polat, Kemal
    Alenezi, Fayadh
    [J]. ADVANCED ENGINEERING INFORMATICS, 2023, 57
  • [2] Efficient Incremental Data Analytics with Apache Spark
    Gholamian, Sina
    Golab, Wojciech
    Ward, Paul A. S.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 2859 - 2868
  • [3] Dynamic Analytics for Spatial Data with an Incremental Clustering Approach
    Mendes, Fernando
    Santos, Maribel Yasmina
    Moura-Pires, Joao
    [J]. 2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2013, : 552 - 559
  • [4] Cost-Efficient Partitioning of Spatial Data on Cloud
    Akdogan, Afsin
    Indrakanti, Saratchandra
    Demiryurek, Ugur
    Shahabi, Cyrus
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 501 - 506
  • [5] Efficient spatial data partitioning for distributed kNN joins
    Zeidan, Ayman
    Vo, Huy T.
    [J]. JOURNAL OF BIG DATA, 2022, 9 (01)
  • [6] Incremental Data Partitioning of RDF Data in SPARK
    Agathangelos, Giannis
    Troullinou, Georgia
    Kondylakis, Haridimos
    Stefanidis, Kostas
    Plexousakis, Dimitris
    [J]. SEMANTIC WEB: ESWC 2018 SATELLITE EVENTS, 2018, 11155 : 50 - 54
  • [7] HARDWARE PARTITIONING FOR BIG DATA ANALYTICS
    Wu, Lisa
    Barker, Raymond J.
    Kim, Martha A.
    Ross, Kenneth A.
    [J]. IEEE MICRO, 2014, 34 (03) : 109 - 119
  • [8] Incremental and Parallel Analytics on Astrophysical Data Streams
    Mishin, Dmitryz
    Budavari, Tamas
    Szalay, Alexander
    Ahmad, Yanif
    [J]. 2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1078 - 1086
  • [9] Locality-Preserving Spatial Partitioning for Geo Big Data Analytics in Main Memory Frameworks
    Al Jawarneh, Isam Mashhour
    Bellavista, Paolo
    Corradi, Antonio
    Foschini, Luca
    Montanari, Rebecca
    [J]. 2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
  • [10] Efficient Analytics on Encrypted Data
    Gershinsky, Gidon
    [J]. SYSTOR'18: PROCEEDINGS OF THE 11TH ACM INTERNATIONAL SYSTEMS AND STORAGE CONFERENCE, 2018, : 121 - 121