An Efficient Improved Join Algorithm Using Map Reduce in Hadoop

被引:0
|
作者
Patel, Warish D. [1 ]
Vaghela, Dineshkumar B. [1 ]
机构
[1] Parul Inst Technol, Dept Comp Sci & Engn, Vadodara, India
关键词
Hadoop; Map/Reduce; Distributed Environment; Big Data; Joins; Multiple Join; Query Processing; Distributed Database; MAPREDUCE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Information explosion is a well known phenomenon now and there is a vast amount of research going on into how best to handle and process huge amounts of data. One such idea for processing enormous quantities of data is Google's Map/Reduce. Map/Reduce was first introduced by Google engineers - Jeffrey Dean and Sanjay Ghemawat [9]. It was designed for and is still used at Google for processing large amounts of raw data (like crawled documents and web-request logs) to produce various kinds of derived data (like inverted indices, web-page summaries, etc.). It is a simple yet powerful framework for implementing distributed applications without having extensive prior knowledge of the intricacies involved in a distributed system. It is highly scalable and works on a cluster of commodity machines with integrated mechanisms for fault tolerance. The programmer is only required to write specialized map and reduce functions as part of the Map/Reduce job and the Map/Reduce framework takes care of the rest. It distributes the data across the cluster, instantiates multiple copies of the map and reduce functions in parallel and takes care of any system failures that might occur during the execution. Since its inception at Google, Map/Reduce has found many adopters. Among them, the prominent one is the Apache Software Foundation, which has developed an Open-Source version of the Map/Reduce framework called Hadoop [2]. Hadoop boasts of a number of large web-based corporate like Yahoo, Facebook, Amazon, etc., that use it for various kinds of data-warehousing purposes. Facebook for instance, uses it to store copies of internal logs and uses it as a source for reporting and machine learning. Owing to its ease of use, installation and implementation, Hadoop has found many uses among programmers. One of them is query evaluation over large datasets. And one of the most important queries are Joins. This project explores the existing solutions, extends them and proposes a few new ideas for joining datasets using Hadoop. Algorithms have been broken into two categories - Two-Way joins and Multi-Way joins. Join algorithms are then discussed and evaluated under both categories. Options to pre-process data in order to improve performance have also been explored. The results are expected to give an insight into how good a fit Hadoop or Map/Reduce is for evaluating Joins.
引用
收藏
页码:263 / 272
页数:10
相关论文
共 50 条
  • [1] A New Scheduling Algorithm in Hadoop Map Reduce
    Peng, Zhiping
    Ma, Yanchun
    [J]. EMERGING RESEARCH IN ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, 2011, 237 : 537 - +
  • [2] Hadoop and Map Reduce Biomedical Images using Clustering
    Sonawane, Minakshi M.
    Kawathekar, Seema S.
    [J]. 2017 INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN COMPUTER, ELECTRICAL, ELECTRONICS AND COMMUNICATION (CTCEEC), 2017, : 945 - 947
  • [3] Improved Resource Exploitation by Combining Hadoop Map Reduce Framework with VirtualBox
    Kaur, Ramanpal
    Kaur, Harjeet
    Dhamija, Archu
    [J]. INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 1, INDIA 2016, 2016, 433 : 41 - 49
  • [4] Distributed FP-ARMH Algorithm in Hadoop Map Reduce Framework
    Natarajan, Surendar
    Sehar, Sountharrajan
    [J]. 2013 INTERNATIONAL CONFERENCE ON GREEN COMPUTING, COMMUNICATION AND CONSERVATION OF ENERGY (ICGCE), 2013, : 264 - 270
  • [5] The Design of the Efficient Theta-Join in Map-Reduce Environment
    Penar, Maciej
    Wilczek, Artur
    [J]. BEYOND DATABASES, ARCHITECTURES AND STRUCTURES, BDAS 2016, 2016, 613 : 204 - 215
  • [6] Weather Data Analytics Using Hadoop with Map-Reduce
    More, Priyanka Dinesh
    Nandgave, Sunita
    Kadam, Megha
    [J]. ICCCE 2019: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND CYBER-PHYSICAL ENGINEERING, 2020, 570 : 189 - 196
  • [7] Addressing Big Data Problem Using Hadoop and Map Reduce
    Patel, Aditya B.
    Birla, Manashvi
    Nair, Ushma
    [J]. 3RD NIRMA UNIVERSITY INTERNATIONAL CONFERENCE ON ENGINEERING (NUICONE 2012), 2012,
  • [8] Prediction of Protein Structures Using a Map-Reduce Hadoop Framework Based Simulated Annealing Algorithm
    Li, Hui
    Liu, Chunmei
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2013,
  • [9] An Executable Specification of Map-Join-Reduce Using Haskell
    Ren, Junqi
    Liu, Lei
    Liu, Feng
    Zhou, Wenbo
    Lu, Shuai
    [J]. IEEE ACCESS, 2019, 7 : 10892 - 10904
  • [10] Efficient Datacenter Clustering in map reduce framework using Cache Index Algorithm
    Reddy, Satti Sai Ram
    Malathi, P.
    Mahalakshmi, D.
    [J]. Test Engineering and Management, 2019, 81 (11-12): : 5418 - 5422