Scheduling distributed multiway spatial join queries: optimization models and algorithms

被引:0
|
作者
de Oliveira, Thiago Borges [1 ]
Costa, Fabio M. [2 ]
Foulds, Les R. [2 ]
Longo, Humberto J. [2 ]
机构
[1] Univ Fed Jatai, Unidade Acad Ciencias Exatas & Tecnol, Jatai, Brazil
[2] Univ Fed Goias, Inst Informat, Goiania, Brazil
基金
巴西圣保罗研究基金会;
关键词
Multiway spatial join; Distributed query scheduling; Lagrangian relaxation; SELECTIVITY ESTIMATION; APACHE SPARK; MANAGEMENT; MAPREDUCE;
D O I
10.1080/13658816.2023.2170380
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multiway spatial joins are a commonly occurring and fundamental type of query for spatial data processing. This article presents models and algorithms to schedule this type of query in distributed database systems while attempting to strike a balance between makespan and communication costs. We propose three algorithms based on combinatorial optimization methods: the well-known linear relaxation technique of rounding a solution generated by linear programming (LP), a more sophisticated Lagrangian Relaxation method (LR), as well as a greedy heuristic (GR) for baseline comparison. Our evaluation shows that a schedule built using GR consumes, on average, 22% more processing and communication resources than a more elaborate schedule constructed via the LR method, when scheduling a query for 64 machines. The schedule provided by LR is also, on average, an order of magnitude closer to the optimal schedule for a query compared to GR. We show that scheduling Gigabyte-size multiway queries before execution can reduce its processing time by an order of magnitude compared to state-of-the-art frameworks for spatial data processing that do not have this capability, and can significantly reduce the amount of shuffled data in the network.
引用
收藏
页码:1388 / 1419
页数:32
相关论文
共 50 条
  • [41] Optimization of parallel execution for multi-join queries
    Chen, MS
    Yu, PS
    Wu, KL
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1996, 8 (03) : 416 - 428
  • [42] Efficient Massively Parallel Join Optimization for Large Queries
    Mancini, Riccardo
    Karthik, Srinivas
    Chandra, Bikash
    Mageirakos, Vasilis
    Ailamaki, Anastasia
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 122 - 135
  • [43] Genetic Optimization for the Join Ordering Problem of Database Queries
    Chande, Swati V.
    Sinha, Madhavi
    2011 ANNUAL IEEE INDIA CONFERENCE (INDICON-2011): ENGINEERING SUSTAINABLE SOLUTIONS, 2011,
  • [44] An optimal evaluation of groupby-join queries in distributed architectures
    Hassan, M. Al Hajj
    Bamha, M.
    WEBIST 2007: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, VOL IT: INTERNET TECHNOLOGY, 2007, : 246 - +
  • [45] INCORPORATING PROCESSOR COSTS IN OPTIMIZING THE DISTRIBUTED EXECUTION OF JOIN QUERIES
    REID, DJ
    MATHEMATICAL AND COMPUTER MODELLING, 1994, 20 (03) : 7 - 29
  • [46] Towards a Learned Cost Model for Distributed Spatial Join: Data, Code & Models
    Tin Vu
    Belussi, Alberto
    Migliorini, Sara
    Eldawy, Ahmed
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4550 - 4554
  • [47] Distributed Spatial Join Processing for Multiple Spatial Datasets - Multi-way Spatial Join
    Cunha, Anderson R.
    de Oliveira, Savio S. T.
    de Oliveira, Thiago B.
    Aleixo, Everton L.
    Cardoso, Marcelo de C.
    do Sacramento Rodrigues, Vagner J.
    2015 XXXIII BRAZILIAN SYMPOSIUM ON COMPUTER NETWORKS AND DISTRIBUTED SYSTEMS, 2015, : 171 - 181
  • [48] Distributed Execution of Spatial SQL Queries
    Giannousis, Konstantinos
    Bereta, Konstantina
    Karalis, Nikolaos
    Koubarakis, Manolis
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 528 - 533
  • [49] A parallel spatial join processing for distributed spatial databases
    Kang, MS
    Ko, SK
    Koh, K
    Choy, YC
    FLEXIBLE QUERY ANSWERING SYSTEMS, PROCEEDINGS, 2002, 2522 : 212 - 225
  • [50] Adaptive Join Algorithms in Dynamic Distributed Databases
    Min J. Yu
    P.C.-Y. Sheu
    Distributed and Parallel Databases, 1997, 5 : 5 - 30