SQL QUERY OPTIMIZATION FOR HIGHLY NORMALIZED BIG DATA

被引:0
|
作者
Golov, Nikolay I. [1 ]
Ronnback, Lars [2 ]
机构
[1] Natl Res Univ, Fac Business & Management, Sch Business Informat, Higher Sch Econ,Dept Business Analyt, 20 Myasnitskaya St, Moscow 101000, Russia
[2] Stocholm Univ, Dept Comp Sci, SE-10691 Stockholm, Sweden
来源
关键词
Big Data; massively parallel processing (MPP); database; normalization; analytics; ad-hoc; querying; modeling; performance;
D O I
暂无
中图分类号
F [经济];
学科分类号
02 ;
摘要
This paper describes an approach for fast ad-hoc analysis of Big Data inside a relational data model. The approach strives to achieve maximal utilization of highly normalized temporary tables through the merge join algorithm. It is designed for the Anchor modeling technique, which requires a very high level of table normalization. Anchor modeling is a novel data warehouse modeling technique, designed for classical databases and adapted by the authors of the article for Big Data environment and a massively parallel processing (MPP) database. Anchor modeling provides flexibility and high speed of data loading, where the presented approach adds support for fast ad-hoc analysis of Big Data sets (tens of terabytes). Different approaches to query plan optimization are described and estimated, for row-based and column-based databases. Theoretical estimations and results of real data experiments carried out in a column-based MPP environment (HP Vertica) are presented and compared. The results show that the approach is particularly favorable when the available RAM resources are scarce, so that a switch is made from pure in-memory processing to spilling over from hard disk, while executing ad-hoc queries. Scaling is also investigated by running the same analysis on different numbers of nodes in the MPP cluster. Configurations of five, ten and twelve nodes were tested, using click stream data of Avito, the biggest classified site in Russia.
引用
下载
收藏
页码:7 / 14
页数:8
相关论文
共 50 条
  • [31] Recursive SQL query optimization with k-iteration lookahead
    Ghazal, Ahmad
    Crolotte, Alain
    Seid, Dawit
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, 4080 : 348 - 357
  • [32] SQL Query Optimization in Content Based Image Retrieval Systems
    Angelescu, Nicoleta
    Coanda, Henri George
    Caciula, Ion
    Dragoi, Ioan Catalin
    Albu, Felix
    2016 INTERNATIONAL CONFERENCE ON COMMUNICATIONS (COMM 2016), 2016, : 395 - 398
  • [33] Diversification on big data in query processing
    Zhang, Meifan
    Wang, Hongzhi
    Li, Jianzhong
    Gao, Hong
    FRONTIERS OF COMPUTER SCIENCE, 2020, 14 (04)
  • [34] Diversification on big data in query processing
    Meifan Zhang
    Hongzhi Wang
    Jianzhong Li
    Hong Gao
    Frontiers of Computer Science, 2020, 14
  • [35] Query grouping-based multi-query optimization framework for interactive SQL query engines on Hadoop
    Chen, Ling
    Lin, Yan
    Wang, Jingchang
    Huang, Heqing
    Chen, Donghui
    Wu, Yong
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (19):
  • [36] iHOME: Index-Based JOIN Query Optimization for Limited Big Data Storage
    Radhya Sahal
    Marwah Nihad
    Mohamed H. Khafagy
    Fatma A. Omara
    Journal of Grid Computing, 2018, 16 : 345 - 380
  • [37] Normalized Storage Model Construction and Query Optimization of Book Multi-Source Heterogeneous Massive Data
    Wang, Dailin
    Liu, Lina
    Liu, Yali
    IEEE ACCESS, 2023, 11 : 96543 - 96553
  • [38] iHOME: Index-Based JOIN Query Optimization for Limited Big Data Storage
    Sahal, Radhya
    Nihad, Marwah
    Khafagy, Mohamed H.
    Omara, Fatma A.
    JOURNAL OF GRID COMPUTING, 2018, 16 (02) : 345 - 380
  • [39] Efficient ELM-Based Two Stages Query Processing Optimization for Big Data
    Ding, Linlin
    Liu, Yu
    Song, Baoyan
    Xin, Junchang
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [40] Multihoming Big Data Network Using Blockchain-Based Query Optimization Scheme
    Jagdish, Mukta
    Anand, Neetu
    Gaurav, Kumar
    Baseer, Samad
    Alqahtani, Abdullah
    Saravanan, V
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022