Hash Semi Cascade Join for Joining Multi-Way Map Reduce

被引:0
|
作者
Mohamed, Marwa Hussien [1 ]
Khafagy, Mohamed Helmy [2 ]
机构
[1] Arab Acad Sci Technol & Maritime Transport, Dept Informat Syst, Cairo, Egypt
[2] Fayoum Univ, Dept Comp Sci, Cairo, Egypt
关键词
Map-reduce; Hadoop; hash tables; semi-join; Two-way; Multi-way;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Map-reduce is a programming model popularized by Google since 2004. It's used with large-scale datasets and processing data on a shared-nothing cluster. Map-Reduce accomplish high performance by partitioning the processes into small units of work that can run in parallel across thousands of nodes in the cluster. Rapidly, increasing in data size has risen importance to uncover hidden pattern to acquire new knowledge and get valuable information. But, map-reduce doesn't directly support join operation. This paper discusses some types of two-way algorithms, list some advantage and disadvantage of every algorithms. We propose a new multi-way join algorithm hash semi cascade join used to join more than two data sets. Using hash tables in the first phase, deleting unused records for joint operation as early as possible to reduce network bottleneck and increase performance. We compare this new algorithm with some types of multi-way join like map side join, reduce side one shot join and reduce side cascade join. Our experimental results show that the map side join has more time for sorting data and do join result with small data sets with high performance but, time increase while data are increased. Reduce side one shot join has join result near map side join. Reduce side cascade join get more time to get the final result. Hash semi cascade join gain high performance using hash tables. According to, reduce shuffling records as in reduce side one shot and reduce side cascade join it can do join for any data set size. As well, using a hash table doesn't effect in memory size.
引用
收藏
页码:355 / 361
页数:7
相关论文
共 38 条
  • [1] Towards Multi-way Join Evaluating with Indexing Partition Support in Map-Reduce
    Li, Yunpeng
    Li, Wenhai
    Chen, Biren
    Song, Wei
    Wen, Weidong
    Li, Wanghong
    [J]. 2013 19TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2013), 2013, : 307 - 314
  • [2] Multi-way spatial join selectivity for the ring join graph
    Min, JK
    Park, HH
    Chung, CW
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2005, 47 (12) : 785 - 795
  • [3] An algorithm for multi-way distance join query
    Liang, Yin
    Zhang, Hong
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 412 - +
  • [4] Towards a Multi-way Similarity Join Operator
    Galkin, Mikhail
    Vidal, Maria-Esther
    Auer, Soeren
    [J]. NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2017, 2017, 767 : 267 - 274
  • [5] ε-Controlled-Replicate: An Improved Controlled-Replicate Algorithm for Multi-way Spatial Join Processing on Map-Reduce
    Gupta, Himanshu
    Chawda, Bhupesh
    [J]. WEB INFORMATION SYSTEMS ENGINEERING, PT II, 2014, 8787 : 278 - 293
  • [6] Multi-way distance join queries in spatial databases
    Corral, A
    Manolopoulos, Y
    Theodoridis, Y
    Vassilakopoulos, M
    [J]. GEOINFORMATICA, 2004, 8 (04) : 373 - 402
  • [7] Multi-Way Distance Join Queries in Spatial Databases
    Antonio Corral
    Yannis Manolopoulos
    Yannis Theodoridis
    Michael Vassilakopoulos
    [J]. GeoInformatica, 2004, 8 : 373 - 402
  • [8] A Multi-way Semi-stream Join for a Near-Real-Time Data Warehouse
    Naeem, M. Asif
    Nguyen, Kim Tung
    Weber, Gerald
    [J]. DATABASES THEORY AND APPLICATIONS, ADC 2017, 2017, 10538 : 59 - 70
  • [9] Multi-way Time Series Join on Multi-length Patterns
    Mollah, Md Parvez
    Souza, Vinicius M. A.
    Mueen, Abdullah
    [J]. 2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 429 - 438
  • [10] A road map for multi-way calibration models
    Escandar, Graciela M.
    Olivieri, Alejandro C.
    [J]. ANALYST, 2017, 142 (16) : 2862 - 2873