Fast OLAP Query Execution in Main Memory on Large Data in a Cluster

被引：0

作者：

Weidner, Martin ^{[1
]}

Dees, Jonathan ^{[1
,2
]}

Sanders, Peter ^{[2
]}

机构：

[1] SAP AG, D-69190 Walldorf, Germany

[2] Karlsruhe Inst Technol, D-76128 Karlsruhe, Germany

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA | 2013年

关键词：

Distributed databases; Distributed computing; Parallel processing; Query processing; Data analysis; Data warehouses; ALGORITHM; SYSTEMS;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Main memory column-stores have proven to be efficient for processing analytical queries. Still, there has been little work in the context of clusters. Using only a single machine poses several restrictions: Processing power and data volume are bounded to the number of cores and main memory fitting on one tightly coupled system. To enable the processing of larger data sets, switching to a cluster becomes necessary. In this work, we explore techniques for efficient execution of analytical SQL queries on large amounts of data in a parallel database cluster while making maximal use of the available hardware. This includes precompiled query plans for efficient CPU utilization, full parallelization on single nodes and across the cluster, and efficient inter-node communication. We implement all features in a prototype for running a subset of TPC-H benchmark queries. We evaluate our implementation in a 128 node cluster running TPC-H queries with 30 000 gigabyte of uncompressed data. Currently, there are no official cluster results for more than 10 000 gigabyte of data, where we achieve up to one to two orders of magnitudes better performance than the current record holder.

引用

页数：7

共 50 条

[31] K2-treaps to Represent and Query Data Warehouses into Main Memory
Vallejos, Cristian
Caniupan, Monica
Gutierrez, Gilberto
[J]. 2017 36TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2017,
[32] Parallel OLAP query processing in database clusters with data replication
Alexandre A. B. Lima
Camille Furtado
Patrick Valduriez
Marta Mattoso
[J]. Distributed and Parallel Databases, 2009, 25 : 97 - 123
[33] OLAP query reformulation in peer-to-peer data warehousing
Golfarelli, M.
Mandreoli, F.
Penzo, W.
Rizzi, S.
Turricchia, E.
[J]. INFORMATION SYSTEMS, 2012, 37 (05) : 393 - 411
[34] Parallel OLAP query processing in database clusters with data replication
Lima, Alexandre A. B.
Furtado, Camille
Valduriez, Patrick
Mattoso, Marta
[J]. DISTRIBUTED AND PARALLEL DATABASES, 2009, 25 (1-2) : 97 - 123
[35] Scalable OLAP queries processing towards large cluster
Wang, Hui-Ju
Qin, Xiong-Pai
Wang, Shan
Zhang, Yan-Song
Li, Fu-Rong
[J]. Jisuanji Xuebao/Chinese Journal of Computers, 2015, 38 (01): : 45 - 58
[36] An adaptive query execution system for data integration
Ives, ZG
Florescu, D
Friedman, M
Levy, A
Weld, DS
[J]. SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999: SIGMOD99: PROCEEDINGS OF THE 1999 ACM SIGMOD - INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 1999, : 299 - 310
[37] Cloaking data to ease view creation, query expression, and query execution
[J]. Murthy, S. (sudarshan.murthy@elseinstitute.org), 1600, Springer Verlag (7260 LNCS):
[38] DimensionSlice: A main-memory data layout for fast scans of multidimensional data
Suh, Ilhyun
Chung, Yon Dohn
[J]. INFORMATION SYSTEMS, 2020, 94
[39] QUERY EXECUTION FOR LARGE RELATIONS ON FUNCTIONAL DISK SYSTEM
KITSUREGAWA, M
NAKANO, M
TAKAGI, M
[J]. PROCEEDINGS : FIFTH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, 1989, : 159 - 167
[40] Hybrid query execution engine for large attributed graphs
Sakr, Sherif
Elnikety, Sameh
He, Yuxiong
[J]. INFORMATION SYSTEMS, 2014, 41 : 45 - 73

← 1 2 3 4 5 →