SQL QUERY OPTIMIZATION FOR HIGHLY NORMALIZED BIG DATA

被引:0
|
作者
Golov, Nikolay I. [1 ]
Ronnback, Lars [2 ]
机构
[1] Natl Res Univ, Fac Business & Management, Sch Business Informat, Higher Sch Econ,Dept Business Analyt, 20 Myasnitskaya St, Moscow 101000, Russia
[2] Stocholm Univ, Dept Comp Sci, SE-10691 Stockholm, Sweden
来源
关键词
Big Data; massively parallel processing (MPP); database; normalization; analytics; ad-hoc; querying; modeling; performance;
D O I
暂无
中图分类号
F [经济];
学科分类号
02 ;
摘要
This paper describes an approach for fast ad-hoc analysis of Big Data inside a relational data model. The approach strives to achieve maximal utilization of highly normalized temporary tables through the merge join algorithm. It is designed for the Anchor modeling technique, which requires a very high level of table normalization. Anchor modeling is a novel data warehouse modeling technique, designed for classical databases and adapted by the authors of the article for Big Data environment and a massively parallel processing (MPP) database. Anchor modeling provides flexibility and high speed of data loading, where the presented approach adds support for fast ad-hoc analysis of Big Data sets (tens of terabytes). Different approaches to query plan optimization are described and estimated, for row-based and column-based databases. Theoretical estimations and results of real data experiments carried out in a column-based MPP environment (HP Vertica) are presented and compared. The results show that the approach is particularly favorable when the available RAM resources are scarce, so that a switch is made from pure in-memory processing to spilling over from hard disk, while executing ad-hoc queries. Scaling is also investigated by running the same analysis on different numbers of nodes in the MPP cluster. Configurations of five, ten and twelve nodes were tested, using click stream data of Avito, the biggest classified site in Russia.
引用
下载
收藏
页码:7 / 14
页数:8
相关论文
共 50 条
  • [1] Big Data and Query Optimization Techniques
    Chugh, Aarti
    Sharma, Vivek Kumar
    Jain, Charu
    ADVANCES IN COMPUTING AND INTELLIGENT SYSTEMS, ICACM 2019, 2020, : 337 - 345
  • [2] Multiple Decisional Query Optimization in Big Data Warehouse
    Rado, Ratsimbazafy
    Boussaid, Omar
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2018, 14 (03) : 22 - 43
  • [3] Adaptive correlation exploitation in big data query optimization
    Liu, Yuchen
    Liu, Hai
    Xiao, Dongqing
    Eltabakh, Mohamed Y.
    VLDB JOURNAL, 2018, 27 (06): : 873 - 898
  • [4] Adaptive correlation exploitation in big data query optimization
    Yuchen Liu
    Hai Liu
    Dongqing Xiao
    Mohamed Y. Eltabakh
    The VLDB Journal, 2018, 27 : 873 - 898
  • [5] Research on Big Data Storage Structure and Query Optimization
    Zhang, Jinhai
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS, ELECTRONICS AND CONTROL (ICCSEC), 2017, : 1508 - 1511
  • [6] Performance Issues and Query Optimization in Big Multidimensional Data
    Kiruthika, Jay
    Khaddaj, Souheil
    PROCEEDINGS OF THIRTEENTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE, (DCABES 2014), 2014, : 24 - 28
  • [7] Query Execution Optimization in Spark SQL
    Ji, Xuechun
    Zhao, Maoxian
    Zhai, Mingyu
    Wu, Qingxi
    SCIENTIFIC PROGRAMMING, 2020, 2020 (2020)
  • [8] A Novel Approach for SQL Query Optimization
    Mithani, Fazal
    Machchhar, Sahista
    Jasdanwala, Fernaz
    2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH, 2016, : 898 - 901
  • [9] QUERY OPTIMIZATION IN MICROSOFT SQL SERVER
    Haxhijaha, Blerta
    Ajdari, Jaumin
    Raufi, Bujar
    Zenuni, Xhemal
    Ismaili, Florie
    INTERNATIONAL JOURNAL ON INFORMATION TECHNOLOGIES AND SECURITY, 2018, 10 (02): : 13 - 22
  • [10] Towards a Multi-engine Query Optimizer for Complex SQL Queries on Big Data
    Kassela, Evdokia
    Konstantinou, Ioannis
    Koziris, Nectarios
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 6095 - 6097