Parallelizing Query Optimization on Shared-Nothing Architectures

被引:0
|
作者
Trummer, Immanuel [1 ]
Koch, Christoph [1 ]
机构
[1] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2016年 / 9卷 / 09期
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data processing systems offer an ever increasing degree of parallelism on the levels of cores, CPUs, and processing nodes. Query optimization must exploit high degrees of parallelism in order not to gradually become the bottleneck of query evaluation. We show how to parallelize query optimization at a massive scale. We present algorithms for parallel query optimization in left-deep and bushy plan spaces. At optimization start, we divide the plan space for a given query into partitions of equal size that are explored in parallel by worker nodes. At the end of optimization, each worker returns the optimal plan in its partition to the master which determines the globally optimal plan from the partition-optimal plans. No synchronization or data exchange is required during the actual optimization phase. The amount of data sent over the network, at the start and at the end of optimization, as well as the complexity of serial steps within our algorithms increase only linearly in the number of workers and in the query size. The time and space complexity of optimization within one partition decreases uniformly in the number of workers. We parallelize single- and multi-objective query optimization over a cluster with 100 nodes in our experiments, using more than 250 concurrent worker threads (Spark executors). Despite high network latency and task assignment overheads, parallelization yields speedups of up to one order of magnitude for large queries whose optimization takes minutes on a single node.
引用
下载
收藏
页码:660 / 671
页数:12
相关论文
共 50 条
  • [31] Dynamic data reallocation for skew management in shared-nothing parallel databases
    Helal, AS
    Yuan, D
    ElRewini, H
    DISTRIBUTED AND PARALLEL DATABASES, 1997, 5 (03) : 271 - 288
  • [32] STRETCH: Virtual Shared-Nothing Parallelism for Scalable and Elastic Stream Processing
    Gulisano, Vincenzo
    Najdataei, Hannaneh
    Nikolakopoulos, Yiannis
    Papadopoulos, Alessandro, V
    Papatriantafilou, Marina
    Tsigas, Philippas
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 4221 - 4238
  • [33] Performance evaluation of three logging schemes for a shared-nothing database server
    Wong, Kam-Fai
    Simulation Practice and Theory, 1998, 6 (04): : 337 - 368
  • [34] Parallel recovery method in shared-nothing spatial database cluster system
    YOU Byeong seob
    KIM Myung keun
    ZOU Yong gui
    BAE Hae young
    重庆邮电大学学报(自然科学版), 2004, (05) : 173 - 180
  • [35] Caching and database scaling in distributed shared-nothing information retrieval systems
    Tomasic, Anthony
    Garcia-Molina, Hector
    SIGMOD Record, 1993, 22 (02) : 129 - 138
  • [36] Performance issues in distributed shared-nothing information-retrieval systems
    Tomasic, A
    GarciaMolina, H
    INFORMATION PROCESSING & MANAGEMENT, 1996, 32 (06) : 647 - 665
  • [37] Performance evaluation of three logging schemes for a shared-nothing database server
    Chinese Univ, Shatin, Hong Kong
    Simul Pract Theory, 4 (337-368):
  • [38] Approaches to balancing data load of shared-nothing clusters and their performance comparison
    Wang, JH
    Tsutaya, Y
    Segawa, N
    Yamane, S
    Murayama, Y
    Miyazaki, M
    Suzuki, H
    NINTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, PROCEEDINGS, 2002, : 293 - 299
  • [39] Graph Partitioning Strategies for Efficient BFS in Shared-Nothing Parallel Systems
    Muntes-Mulero, Victor
    Martinez-Bazan, Norbert
    Larriba-Pey, Josep-Lluis
    Pacitti, Esther
    Valduriez, Patrick
    WEB-AGE INFORMATION MANAGEMENT, 2010, 6185 : 13 - +
  • [40] Parallel relational operations using clustered surrogate files on shared-nothing multiprocessors
    Chung, SM
    INFORMATION SCIENCES, 1998, 105 (1-4) : 1 - 29