SQL QUERY OPTIMIZATION FOR HIGHLY NORMALIZED BIG DATA

被引：0

作者：

Golov, Nikolay I. ^{[1
]}

Ronnback, Lars ^{[2
]}

机构：

[1] Natl Res Univ, Fac Business & Management, Sch Business Informat, Higher Sch Econ,Dept Business Analyt, 20 Myasnitskaya St, Moscow 101000, Russia

[2] Stocholm Univ, Dept Comp Sci, SE-10691 Stockholm, Sweden

来源：

BIZNES INFORMATIKA-BUSINESS INFORMATICS | 2015年 / 33卷 / 03期

关键词：

Big Data; massively parallel processing (MPP); database; normalization; analytics; ad-hoc; querying; modeling; performance;

D O I：

暂无

中图分类号：

F [经济];

学科分类号：

02 ;

摘要：

This paper describes an approach for fast ad-hoc analysis of Big Data inside a relational data model. The approach strives to achieve maximal utilization of highly normalized temporary tables through the merge join algorithm. It is designed for the Anchor modeling technique, which requires a very high level of table normalization. Anchor modeling is a novel data warehouse modeling technique, designed for classical databases and adapted by the authors of the article for Big Data environment and a massively parallel processing (MPP) database. Anchor modeling provides flexibility and high speed of data loading, where the presented approach adds support for fast ad-hoc analysis of Big Data sets (tens of terabytes). Different approaches to query plan optimization are described and estimated, for row-based and column-based databases. Theoretical estimations and results of real data experiments carried out in a column-based MPP environment (HP Vertica) are presented and compared. The results show that the approach is particularly favorable when the available RAM resources are scarce, so that a switch is made from pure in-memory processing to spilling over from hard disk, while executing ad-hoc queries. Scaling is also investigated by running the same analysis on different numbers of nodes in the MPP cluster. Configurations of five, ten and twelve nodes were tested, using click stream data of Avito, the biggest classified site in Russia.

引用

下载

页码：7 / 14

页数：8

共 50 条

[21] In-database query optimization on SQL with ML predicates
Yunyan Guo
Guoliang Li
Ruilin Hu
Yong Wang
The VLDB Journal, 2025, 34 (1)
[22] AutoSteer: Learned Query Optimization for Any SQL Database
Anneser, Christoph
Tatbul, Nesime
Cohen, David
Xu, Zhenggang
Pandian, Prithviraj
Laptev, Nikolay
Marcus, Ryan
PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 3515 - 3527
[23] SQL query optimization through nested relational algebra
Cao, Bin
Badia, Antonio
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2007, 32 (03):
[24] Sampling Big Ideas in Query Optimization
Cohen, Edith
PROCEEDINGS OF THE 42ND ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, PODS 2023, 2023, : 361 - 371
[25] An optimal framework for spatial query optimization using hadoop in big data analytics
Dadheech P.
Goyal D.
Srivastava S.
Kumar A.
Recent Advances in Computer Science and Communications, 2020, 13 (06): : 1188 - 1198
[26] Scalable and data-aware SQL query recommendations
Arzamasova, Natalia
Boehm, Klemens
INFORMATION SYSTEMS, 2021, 96
[27] SQL query to increase data accuracy and completeness in PATSTAT
Pasimeni, Francesco
WORLD PATENT INFORMATION, 2019, 57 : 1 - 7
[28] A Robust Optimization Approach of SQL-to-SPARQL Query Rewriting
Ahmed, Abatal
Bahaj, Mohamed
Nassima, Soussi
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (11) : 538 - 543
[29] Query Optimization Approach with Middle Storage Layer for Spark SQL
Song, Aibo
Zhai, Mingyu
Xue, Yingying
Chen, Peng
Du, Mingyang
Wan, Yutong
PROCEEDINGS OF THE 2018 IEEE 22ND INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN ((CSCWD)), 2018, : 184 - 189
[30] QMapper: A Tool for SQL Optimization on Hive Using Query Rewriting
Xu, Yingzhong
Hu, Songlin
PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 211 - 212

← 1 2 3 4 5 →