SkipSJoin: A New Physical Design for Distributed Big Data Warehouses in Hadoop

被引:4
|
作者
Ramdane, Yassine [1 ]
Kabachi, Nadia [2 ]
Boussaid, Omar [1 ]
Bentayeb, Fadila [1 ]
机构
[1] Univ Lyon, Lyon 2, ERIC EA 3083, 5 Ave Pierre Mendes, F-69676 Bron, France
[2] Univ Claude Bernard Lyon 1, Univ Lyon, ERIC EA 3083, 43,Blvd 11 Novembre 1918, F-69100 Villeurbanne, France
来源
CONCEPTUAL MODELING, ER 2019 | 2019年 / 11788卷
关键词
Load balancing; Bucket; Sort-merge-bucket join;
D O I
10.1007/978-3-030-33223-5_21
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Hadoop uses horizontal partitioning to improve the performance of a big data warehouse. A major challenge when horizontally partitioning the tables of a big data warehouse is to reduce network traffic for a given workload. A common technique to avoid this issue, when performing a join operation, is to co-partition the tables of the data warehouse on their join key. However, in the existing partitioning schemes, executing a star join operation in Hadoop still needs many MapReduce cycles. In this paper, we combine a data-driven and a workload-driven model to create a new scheme for distributed big data warehouses over Hadoop, called "SkipSJoin". Our approach allows performing the star join operation in only one Spark stage, and allows skipping the loading of some unnecessary HDFS blocks. Our experiments show that our proposal outperforms some approaches in terms of query execution time.
引用
收藏
页码:255 / 263
页数:9
相关论文
共 50 条
  • [1] SDWP: A New Data Placement Strategy for Distributed Big Data Warehouses in Hadoop
    Ramdane, Yassine
    Kabachi, Nadia
    Boussaid, Omar
    Bentayeb, Fadila
    [J]. BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2019, 2019, 11708 : 189 - 205
  • [2] Design Process for Big Data Warehouses
    Di Tria, Francesco
    Lefons, Ezio
    Tangorra, Filippo
    [J]. 2014 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2014, : 512 - 518
  • [3] Efficient Big Data Modelling and Organization for Hadoop Hive-Based Data Warehouses
    Costa, Eduarda
    Costa, Carlos
    Santos, Maribel Yasmina
    [J]. INFORMATION SYSTEMS, EMCIS 2017, 2017, 299 : 3 - 16
  • [4] Physical database design for data warehouses
    Labio, WJ
    Quass, D
    Adelberg, B
    [J]. 13TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING - PROCEEDINGS, 1997, : 277 - 288
  • [5] A Genetic Optimization Physical Planner for Big Data Warehouses
    Benkrid, Soumia
    Mestoui, Yacine
    Bellatreche, Ladjel
    Ordonez, Carlos
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 406 - 412
  • [6] Hadoop Distributed File System for Big data analysis
    Almansouri, Hatim Talal
    Masmoudi, Youssef
    [J]. PROCEEDINGS OF 2019 IEEE 4TH WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS' 19), 2019, : 257 - 261
  • [7] A formal approach to the design of distributed data warehouses
    Zhao, J
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2005, PT 2, 2005, 3481 : 1235 - 1244
  • [8] Building a novel physical design of a distributed big data warehouse over a Hadoop cluster to enhance OLAP cube query performance
    Ramdane, Yassine
    Boussaid, Omar
    Boukraa, Doulkifli
    Kabachi, Nadia
    Bentayeb, Fadila
    [J]. PARALLEL COMPUTING, 2022, 111
  • [9] Scheduling in Big Data Heterogeneous Distributed System Using Hadoop
    Thakkar, Shraddha
    Patel, Sanjay
    [J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ICT FOR SUSTAINABLE DEVELOPMENT ICT4SD 2015, VOL 2, 2016, 409 : 119 - 131
  • [10] Building Data Warehouses in the Era of Big Data An Approach for Scalable and Flexible Big Data Warehouses
    Costa, Carlos
    Santos, Maribel Yasmina
    [J]. ADVANCED INFORMATION SYSTEMS ENGINEERING (CAISE 2019), 2019, 11483 : 693 - 695