Fast OLAP Query Execution in Main Memory on Large Data in a Cluster

被引:0
|
作者
Weidner, Martin [1 ]
Dees, Jonathan [1 ,2 ]
Sanders, Peter [2 ]
机构
[1] SAP AG, D-69190 Walldorf, Germany
[2] Karlsruhe Inst Technol, D-76128 Karlsruhe, Germany
关键词
Distributed databases; Distributed computing; Parallel processing; Query processing; Data analysis; Data warehouses; ALGORITHM; SYSTEMS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Main memory column-stores have proven to be efficient for processing analytical queries. Still, there has been little work in the context of clusters. Using only a single machine poses several restrictions: Processing power and data volume are bounded to the number of cores and main memory fitting on one tightly coupled system. To enable the processing of larger data sets, switching to a cluster becomes necessary. In this work, we explore techniques for efficient execution of analytical SQL queries on large amounts of data in a parallel database cluster while making maximal use of the available hardware. This includes precompiled query plans for efficient CPU utilization, full parallelization on single nodes and across the cluster, and efficient inter-node communication. We implement all features in a prototype for running a subset of TPC-H benchmark queries. We evaluate our implementation in a 128 node cluster running TPC-H queries with 30 000 gigabyte of uncompressed data. Currently, there are no official cluster results for more than 10 000 gigabyte of data, where we achieve up to one to two orders of magnitudes better performance than the current record holder.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] K2-treaps to Represent and Query Data Warehouses into Main Memory
    Vallejos, Cristian
    Caniupan, Monica
    Gutierrez, Gilberto
    [J]. 2017 36TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2017,
  • [32] Parallel OLAP query processing in database clusters with data replication
    Alexandre A. B. Lima
    Camille Furtado
    Patrick Valduriez
    Marta Mattoso
    [J]. Distributed and Parallel Databases, 2009, 25 : 97 - 123
  • [33] OLAP query reformulation in peer-to-peer data warehousing
    Golfarelli, M.
    Mandreoli, F.
    Penzo, W.
    Rizzi, S.
    Turricchia, E.
    [J]. INFORMATION SYSTEMS, 2012, 37 (05) : 393 - 411
  • [34] Parallel OLAP query processing in database clusters with data replication
    Lima, Alexandre A. B.
    Furtado, Camille
    Valduriez, Patrick
    Mattoso, Marta
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2009, 25 (1-2) : 97 - 123
  • [35] Scalable OLAP queries processing towards large cluster
    Wang, Hui-Ju
    Qin, Xiong-Pai
    Wang, Shan
    Zhang, Yan-Song
    Li, Fu-Rong
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2015, 38 (01): : 45 - 58
  • [36] An adaptive query execution system for data integration
    Ives, ZG
    Florescu, D
    Friedman, M
    Levy, A
    Weld, DS
    [J]. SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999: SIGMOD99: PROCEEDINGS OF THE 1999 ACM SIGMOD - INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 1999, : 299 - 310
  • [37] Cloaking data to ease view creation, query expression, and query execution
    [J]. Murthy, S. (sudarshan.murthy@elseinstitute.org), 1600, Springer Verlag (7260 LNCS):
  • [38] DimensionSlice: A main-memory data layout for fast scans of multidimensional data
    Suh, Ilhyun
    Chung, Yon Dohn
    [J]. INFORMATION SYSTEMS, 2020, 94
  • [39] QUERY EXECUTION FOR LARGE RELATIONS ON FUNCTIONAL DISK SYSTEM
    KITSUREGAWA, M
    NAKANO, M
    TAKAGI, M
    [J]. PROCEEDINGS : FIFTH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, 1989, : 159 - 167
  • [40] Hybrid query execution engine for large attributed graphs
    Sakr, Sherif
    Elnikety, Sameh
    He, Yuxiong
    [J]. INFORMATION SYSTEMS, 2014, 41 : 45 - 73