Scheduling distributed multiway spatial join queries: optimization models and algorithms

被引:0
|
作者
de Oliveira, Thiago Borges [1 ]
Costa, Fabio M. [2 ]
Foulds, Les R. [2 ]
Longo, Humberto J. [2 ]
机构
[1] Univ Fed Jatai, Unidade Acad Ciencias Exatas & Tecnol, Jatai, Brazil
[2] Univ Fed Goias, Inst Informat, Goiania, Brazil
基金
巴西圣保罗研究基金会;
关键词
Multiway spatial join; Distributed query scheduling; Lagrangian relaxation; SELECTIVITY ESTIMATION; APACHE SPARK; MANAGEMENT; MAPREDUCE;
D O I
10.1080/13658816.2023.2170380
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multiway spatial joins are a commonly occurring and fundamental type of query for spatial data processing. This article presents models and algorithms to schedule this type of query in distributed database systems while attempting to strike a balance between makespan and communication costs. We propose three algorithms based on combinatorial optimization methods: the well-known linear relaxation technique of rounding a solution generated by linear programming (LP), a more sophisticated Lagrangian Relaxation method (LR), as well as a greedy heuristic (GR) for baseline comparison. Our evaluation shows that a schedule built using GR consumes, on average, 22% more processing and communication resources than a more elaborate schedule constructed via the LR method, when scheduling a query for 64 machines. The schedule provided by LR is also, on average, an order of magnitude closer to the optimal schedule for a query compared to GR. We show that scheduling Gigabyte-size multiway queries before execution can reduce its processing time by an order of magnitude compared to state-of-the-art frameworks for spatial data processing that do not have this capability, and can significantly reduce the amount of shuffled data in the network.
引用
收藏
页码:1388 / 1419
页数:32
相关论文
共 50 条
  • [31] Spatial join strategies in distributed spatial DBMS
    Abel, DJ
    Ooi, BC
    Tan, KL
    Power, R
    Yu, JX
    ADVANCES IN SPATIAL DATABASES, 1995, 951 : 348 - 367
  • [32] OPTIMAL PARALLEL SCHEDULING OF M-WAY JOIN QUERIES
    FOTOUHI, F
    LEIGH, J
    RANA, SP
    INFORMATION SYSTEMS, 1991, 16 (06) : 627 - 639
  • [33] Distance join queries of multiple inputs in spatial databases
    Corral, A
    Manolopoulos, Y
    Theodoridis, Y
    Vassilakopoulos, M
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, PROCEEDINGS, 2003, 2798 : 323 - 338
  • [34] A Taxonomy for Distance-Based Spatial Join Queries
    Li, Lingxiao
    Taniar, David
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2017, 13 (03) : 1 - 24
  • [35] Spatial Join Queries Based on QR-tree
    Yang Zexue
    Hao Zhong Xiao
    SPORTS MATERIALS, MODELLING AND SIMULATION, 2011, 187 : 752 - 757
  • [36] A Distributed Optimization Approach to Consistent Multiway Matching
    Leonardos, Spyridon
    Daniilidis, Kostas
    2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 89 - 96
  • [37] Load-balancing remote spatial join queries in a spatial GRID
    Mondal, A
    Kitsuregawa, M
    CONCEPTUAL MODELING - ER 2004, PROCEEDINGS, 2004, 3288 : 450 - 463
  • [38] Distributed shop scheduling: A comprehensive review on classifications, models and algorithms
    Duan, Jianguo
    Wang, Mengting
    Zhang, Qinglei
    Qin, Jiyun
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (08) : 15265 - 15308
  • [39] OPTIMIZATION OF DISTRIBUTED TREE QUERIES
    YU, CT
    OZSOYOGLU, ZM
    LAM, K
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1984, 29 (03) : 409 - 445
  • [40] Combinatorial optimization of distributed queries
    Groselj, B
    Malluhi, QM
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1995, 7 (06) : 915 - 927