Parallelizing Query Optimization on Shared-Nothing Architectures

被引:0
|
作者
Trummer, Immanuel [1 ]
Koch, Christoph [1 ]
机构
[1] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2016年 / 9卷 / 09期
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data processing systems offer an ever increasing degree of parallelism on the levels of cores, CPUs, and processing nodes. Query optimization must exploit high degrees of parallelism in order not to gradually become the bottleneck of query evaluation. We show how to parallelize query optimization at a massive scale. We present algorithms for parallel query optimization in left-deep and bushy plan spaces. At optimization start, we divide the plan space for a given query into partitions of equal size that are explored in parallel by worker nodes. At the end of optimization, each worker returns the optimal plan in its partition to the master which determines the globally optimal plan from the partition-optimal plans. No synchronization or data exchange is required during the actual optimization phase. The amount of data sent over the network, at the start and at the end of optimization, as well as the complexity of serial steps within our algorithms increase only linearly in the number of workers and in the query size. The time and space complexity of optimization within one partition decreases uniformly in the number of workers. We parallelize single- and multi-objective query optimization over a cluster with 100 nodes in our experiments, using more than 250 concurrent worker threads (Spark executors). Despite high network latency and task assignment overheads, parallelization yields speedups of up to one order of magnitude for large queries whose optimization takes minutes on a single node.
引用
下载
收藏
页码:660 / 671
页数:12
相关论文
共 50 条
  • [41] A PERFORMANCE EVALUATION OF 4 PARALLEL JOIN ALGORITHMS IN A SHARED-NOTHING MULTIPROCESSOR ENVIRONMENT
    SCHNEIDER, DA
    DEWITT, DJ
    PROCEEDINGS OF THE 1989 ACM SIGMOD INTERNATIONAL CONFERENCE ON THE MANAGEMENT OF DATA, 1989, 18 : 110 - 121
  • [42] 基于Shared-Nothing的并行Hash连接算法效率分析
    李庆华
    睢海燕
    邓冲
    软件学报, 2000, (03) : 386 - 392
  • [43] Parallel relational operations using clustered surrogate files on shared-nothing multiprocessors
    Wright State Univ, Dayton, United States
    Inf Sci, 1-4 (1-29):
  • [44] An extendible hashing based recovery method in a shared-nothing spatial database cluster
    Jang, Yong-Il
    Kim, Ho-Seok
    Park, Soon-Young
    Lee, Jae-Dong
    Bae, Hae-Young
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2006, PT 4, 2006, 3983 : 1126 - 1135
  • [45] Parallelizing the ZSWEEP algorithm for distributed-shared memory architectures
    Farias, R
    Silva, CT
    VOLUME GRAPHICS 2001, 2001, : 181 - +
  • [46] ACTDP: An Adaptive Chunk Tool for Database Partition in Shared-nothing Distributed Database
    Huang, Xiaoming
    Shen, Zhen
    PROCEEDINGS OF 2018 IEEE 9TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2018, : 903 - 906
  • [47] TriAD: A Distributed Shared-Nothing RDF Engine based on Asynchronous Message Passing
    Gurajada, Sairam
    Seufert, Stephan
    Miliaraki, Iris
    Theobald, Martin
    SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 289 - 300
  • [48] Popularity-based covering sets for energy proportionality in shared-nothing clusters
    Kim, Minki
    Cho, Haengrae
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (05): : 1885 - 1910
  • [49] Study of loading strategy in shared-nothing event stream parallel database systems
    Liu, Ying
    Wang, Qirong
    Sun, Ninghui
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2009, 46 (01): : 159 - 166
  • [50] STRETCH: Scalable and Elastic Deterministic Streaming Analysis with Virtual Shared-Nothing Parallelism
    Najdataei, Hannaneh
    Nikolakopoulos, Yiannis
    Papatriantafilou, Marina
    Tsigas, Philippas
    Gulisano, Vincenzo
    DEBS'19: PROCEEDINGS OF THE 13TH ACM INTERNATIONAL CONFERENCE ON DISTRIBUTED AND EVENT-BASED SYSTEMS, 2019, : 7 - 18