SQL QUERY OPTIMIZATION FOR HIGHLY NORMALIZED BIG DATA

被引:0
|
作者
Golov, Nikolay I. [1 ]
Ronnback, Lars [2 ]
机构
[1] Natl Res Univ, Fac Business & Management, Sch Business Informat, Higher Sch Econ,Dept Business Analyt, 20 Myasnitskaya St, Moscow 101000, Russia
[2] Stocholm Univ, Dept Comp Sci, SE-10691 Stockholm, Sweden
来源
关键词
Big Data; massively parallel processing (MPP); database; normalization; analytics; ad-hoc; querying; modeling; performance;
D O I
暂无
中图分类号
F [经济];
学科分类号
02 ;
摘要
This paper describes an approach for fast ad-hoc analysis of Big Data inside a relational data model. The approach strives to achieve maximal utilization of highly normalized temporary tables through the merge join algorithm. It is designed for the Anchor modeling technique, which requires a very high level of table normalization. Anchor modeling is a novel data warehouse modeling technique, designed for classical databases and adapted by the authors of the article for Big Data environment and a massively parallel processing (MPP) database. Anchor modeling provides flexibility and high speed of data loading, where the presented approach adds support for fast ad-hoc analysis of Big Data sets (tens of terabytes). Different approaches to query plan optimization are described and estimated, for row-based and column-based databases. Theoretical estimations and results of real data experiments carried out in a column-based MPP environment (HP Vertica) are presented and compared. The results show that the approach is particularly favorable when the available RAM resources are scarce, so that a switch is made from pure in-memory processing to spilling over from hard disk, while executing ad-hoc queries. Scaling is also investigated by running the same analysis on different numbers of nodes in the MPP cluster. Configurations of five, ten and twelve nodes were tested, using click stream data of Avito, the biggest classified site in Russia.
引用
下载
收藏
页码:7 / 14
页数:8
相关论文
共 50 条
  • [41] SQL Query optimalization
    Cerna, Eva
    Herold, Petr
    Tyrychtr, Jan
    AGRARIAN PERSPECTIVES XVIII, VOL 3, 2009, : 71 - 74
  • [42] SQL Query optimalization
    Cerna, Eva
    Herold, Petr
    Tyrychtr, Jan
    AGRARIAN PERSPECTIVES XVIII, VOLS 1 AND 2, 2009,
  • [43] Federated Query processing for Big Data in Data Science
    Muniswamaiah, Manoj
    Agerwala, Tilak
    Tappert, Charles C.
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 6145 - 6147
  • [44] An SQL-like Query Tool for Data Anonymization and Outsourcing
    Nassar, Mohamed
    Orabi, Adel Al-Rahal
    Doha, Marwan
    AL Bouna, Bechara
    2015 INTERNATIONAL CONFERENCE ON CYBER SITUATIONAL AWARENESS, DATA ANALYTICS AND ASSESSMENT (CYBERSA), 2015,
  • [45] Query Optimization Approach with Shuffle Intermediate Cache Layer for Spark SQL
    Zhai, Mingyu
    Song, Aibo
    Qiu, Jingyi
    Ji, Xuechun
    Wu, Qingxi
    2019 IEEE 38TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2019,
  • [46] Adaptive SQL Query Optimization in Distributed Stream Processing: A Preliminary Study
    Sharkova, Darya
    Chernokoz, Alexander
    Trofimov, Artem
    Sokolov, Nikita
    Gorshkova, Ekaterina
    Kuralenok, Igor
    Novikov, Boris
    SOFTWARE FOUNDATIONS FOR DATA INTEROPERABILITY, SFDI 2021, 2022, 1457 : 96 - 109
  • [47] A JIT Compilation-based Unified SQL Query Optimization System
    Lee, Myungcheol
    Lee, Miyoung
    Kim, ChangSoo
    2016 6TH INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS 2016), 2016, : 193 - 194
  • [48] DATA ABSTRACTION AND QUERY OPTIMIZATION
    ZDONIK, SB
    LECTURE NOTES IN COMPUTER SCIENCE, 1988, 334 : 368 - 373
  • [49] Query Performance Analysis of NoSQL and Big Data
    Samanta, Ashis Kumar
    Sarkar, Bidut Biman
    Chaki, Nabendu
    2018 FOURTH IEEE INTERNATIONAL CONFERENCE ON RESEARCH IN COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (ICRCICN), 2018, : 237 - 241
  • [50] Storage and Query Indexing Methods on Big Data
    QingE Wu
    Yao Yu
    Lintao Zhou
    Yingbo Lu
    Hu Chen
    Xiaoliang Qian
    Arabian Journal for Science and Engineering, 2024, 49 : 7359 - 7374