Fast OLAP Query Execution in Main Memory on Large Data in a Cluster

被引:0
|
作者
Weidner, Martin [1 ]
Dees, Jonathan [1 ,2 ]
Sanders, Peter [2 ]
机构
[1] SAP AG, D-69190 Walldorf, Germany
[2] Karlsruhe Inst Technol, D-76128 Karlsruhe, Germany
关键词
Distributed databases; Distributed computing; Parallel processing; Query processing; Data analysis; Data warehouses; ALGORITHM; SYSTEMS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Main memory column-stores have proven to be efficient for processing analytical queries. Still, there has been little work in the context of clusters. Using only a single machine poses several restrictions: Processing power and data volume are bounded to the number of cores and main memory fitting on one tightly coupled system. To enable the processing of larger data sets, switching to a cluster becomes necessary. In this work, we explore techniques for efficient execution of analytical SQL queries on large amounts of data in a parallel database cluster while making maximal use of the available hardware. This includes precompiled query plans for efficient CPU utilization, full parallelization on single nodes and across the cluster, and efficient inter-node communication. We implement all features in a prototype for running a subset of TPC-H benchmark queries. We evaluate our implementation in a 128 node cluster running TPC-H queries with 30 000 gigabyte of uncompressed data. Currently, there are no official cluster results for more than 10 000 gigabyte of data, where we achieve up to one to two orders of magnitudes better performance than the current record holder.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] DATA-FLOW QUERY EXECUTION IN A PARALLEL MAIN-MEMORY ENVIRONMENT
    WILSCHUT, AN
    APERS, PMG
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 1993, 1 (01) : 103 - 128
  • [2] Fast and Energy-Efficient OLAP Data Management on Hybrid Main Memory Systems
    Hassan, Ahmad
    Nikolopoulos, Dimitrios
    Vandierendonck, Hans
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (11) : 1597 - 1611
  • [3] MiNT-OLAP cluster: minimizing network transmission cost in OLAP cluster for main memory analytical database
    Jiao, Min
    Zhang, Yansong
    Wang, Zhanwei
    Wang, Shan
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2012, 6 (06) : 668 - 676
  • [4] MiNT-OLAP cluster: minimizing network transmission cost in OLAP cluster for main memory analytical database
    Min Jiao
    Yansong Zhang
    Zhanwei Wang
    Shan Wang
    [J]. Frontiers of Computer Science, 2012, 6 : 668 - 676
  • [5] What-if query processing policy of main-memory OLAP system
    Zhang Y.-S.
    Xiao Y.-Q.
    Wang S.
    Chen H.
    [J]. Ruan Jian Xue Bao/Journal of Software, 2010, 21 (10): : 2494 - 2512
  • [6] Research on multicore parallel query processing techniques for main-memory OLAP
    [J]. Zhang, Yan-Song, 1895, Science Press (37):
  • [7] OLAP query processing in a database cluster
    Lima, AAB
    Mattoso, M
    Valduriez, P
    [J]. EURO-PAR 2004 PARALLEL PROCESSING, PROCEEDINGS, 2004, 3149 : 355 - 362
  • [8] Feisu: Fast Query Execution over Heterogeneous Data Sources on Large-Scale Clusters
    Qin, An
    Yuan, Yuan
    Tan, Dai
    Sun, Pengyu
    Zhang, Xiang
    Cao, Hao
    Lee, Rubao
    Zhang, Xiaodong
    [J]. 2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 1173 - 1182
  • [9] Distributed aggregate functions enabled parallel main-memory OLAP query processing technique
    Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education , Beijing 100872, China
    不详
    不详
    [J]. Ruan Jian Xue Bao, 2009, SUPPL. 1 (165-175):
  • [10] OLAP query routing and physical design in a database cluster
    Röhm, U
    Böhm, K
    Schek, HJ
    [J]. ADVANCES IN DATABASE TECHNOLOGY-DEBT 2000, PROCEEDINGS, 2000, 1777 : 254 - 268