SQL QUERY OPTIMIZATION FOR HIGHLY NORMALIZED BIG DATA

被引:0
|
作者
Golov, Nikolay I. [1 ]
Ronnback, Lars [2 ]
机构
[1] Natl Res Univ, Fac Business & Management, Sch Business Informat, Higher Sch Econ,Dept Business Analyt, 20 Myasnitskaya St, Moscow 101000, Russia
[2] Stocholm Univ, Dept Comp Sci, SE-10691 Stockholm, Sweden
来源
关键词
Big Data; massively parallel processing (MPP); database; normalization; analytics; ad-hoc; querying; modeling; performance;
D O I
暂无
中图分类号
F [经济];
学科分类号
02 ;
摘要
This paper describes an approach for fast ad-hoc analysis of Big Data inside a relational data model. The approach strives to achieve maximal utilization of highly normalized temporary tables through the merge join algorithm. It is designed for the Anchor modeling technique, which requires a very high level of table normalization. Anchor modeling is a novel data warehouse modeling technique, designed for classical databases and adapted by the authors of the article for Big Data environment and a massively parallel processing (MPP) database. Anchor modeling provides flexibility and high speed of data loading, where the presented approach adds support for fast ad-hoc analysis of Big Data sets (tens of terabytes). Different approaches to query plan optimization are described and estimated, for row-based and column-based databases. Theoretical estimations and results of real data experiments carried out in a column-based MPP environment (HP Vertica) are presented and compared. The results show that the approach is particularly favorable when the available RAM resources are scarce, so that a switch is made from pure in-memory processing to spilling over from hard disk, while executing ad-hoc queries. Scaling is also investigated by running the same analysis on different numbers of nodes in the MPP cluster. Configurations of five, ten and twelve nodes were tested, using click stream data of Avito, the biggest classified site in Russia.
引用
下载
收藏
页码:7 / 14
页数:8
相关论文
共 50 条
  • [21] In-database query optimization on SQL with ML predicates
    Yunyan Guo
    Guoliang Li
    Ruilin Hu
    Yong Wang
    The VLDB Journal, 2025, 34 (1)
  • [22] AutoSteer: Learned Query Optimization for Any SQL Database
    Anneser, Christoph
    Tatbul, Nesime
    Cohen, David
    Xu, Zhenggang
    Pandian, Prithviraj
    Laptev, Nikolay
    Marcus, Ryan
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (12): : 3515 - 3527
  • [23] SQL query optimization through nested relational algebra
    Cao, Bin
    Badia, Antonio
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2007, 32 (03):
  • [24] Sampling Big Ideas in Query Optimization
    Cohen, Edith
    PROCEEDINGS OF THE 42ND ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, PODS 2023, 2023, : 361 - 371
  • [25] An optimal framework for spatial query optimization using hadoop in big data analytics
    Dadheech P.
    Goyal D.
    Srivastava S.
    Kumar A.
    Recent Advances in Computer Science and Communications, 2020, 13 (06): : 1188 - 1198
  • [26] Scalable and data-aware SQL query recommendations
    Arzamasova, Natalia
    Boehm, Klemens
    INFORMATION SYSTEMS, 2021, 96
  • [27] SQL query to increase data accuracy and completeness in PATSTAT
    Pasimeni, Francesco
    WORLD PATENT INFORMATION, 2019, 57 : 1 - 7
  • [28] A Robust Optimization Approach of SQL-to-SPARQL Query Rewriting
    Ahmed, Abatal
    Bahaj, Mohamed
    Nassima, Soussi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (11) : 538 - 543
  • [29] Query Optimization Approach with Middle Storage Layer for Spark SQL
    Song, Aibo
    Zhai, Mingyu
    Xue, Yingying
    Chen, Peng
    Du, Mingyang
    Wan, Yutong
    PROCEEDINGS OF THE 2018 IEEE 22ND INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN ((CSCWD)), 2018, : 184 - 189
  • [30] QMapper: A Tool for SQL Optimization on Hive Using Query Rewriting
    Xu, Yingzhong
    Hu, Songlin
    PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 211 - 212