Hadoop Distributed File System for Big data analysis

被引:2
|
作者
Almansouri, Hatim Talal [1 ]
Masmoudi, Youssef [1 ]
机构
[1] Saudi Elect Univ, Riyadh, Saudi Arabia
关键词
Hadoop; MapReduce; HDFS; DataNode; NameNode; Big Data Analysis;
D O I
10.1109/icocs.2019.8930804
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Hadoop is framework that is processing data with large volume that cannot be processed by conventional systems. Hadoop has management file system called Hadoop Distributed File System (HDFS) that has NameNode and DataNode where the data is divided into blocks based on the total size of dataset. In addition, Hadoop has MapReduce where the dataset is processed in Mapping phase and then reducing phase. Using Hadoop for big data analysis has been revealed important information that can be used for analytical purpose and enabling new products. Big data could be found in many different resources such as social networks, web server logs, broadcast audio streams and banking transactions. In this paper, we illustrated the main steps to setup Hadoop and MapReduce. The illustrated version in this work is the latest released of Hadoop 3.1.1 for big data analysis. A simplified pseudo code is provided to show the functionality of Map class and reduce class. The developed steps are applied with a given example that could be generalized with bigger data.
引用
收藏
页码:257 / 261
页数:5
相关论文
共 50 条
  • [1] An approach for Big Data Security based on Hadoop Distributed File system
    Mahmoud, Hadeer
    Hegazy, Abdelfatah
    Khafagy, Mohamed H.
    [J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN COMPUTER ENGINEERING (ITCE' 2018), 2018, : 109 - 114
  • [2] Big Data Performance Analysis on a Hadoop Distributed File System Based on Geometric Data Perturbation Technique
    Marichamy, V. Santhana
    Natarajan, V.
    [J]. 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 : 415 - 420
  • [3] Data Security in Hadoop Distributed File System
    Shetty, Madhvaraj M.
    Manjaiah, D. H.
    [J]. IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGICAL TRENDS IN COMPUTING, COMMUNICATIONS AND ELECTRICAL ENGINEERING (ICETT), 2016,
  • [4] Big Data Performance Analysis on a Hadoop Distributed File System Based on Modified Partitional Clustering Algorithm
    Marichamy, V. Santhana
    Natarajan, V
    [J]. SUSTAINABLE COMMUNICATION NETWORKS AND APPLICATION, ICSCN 2019, 2020, 39 : 461 - 468
  • [5] Analysis of DNA Data Using Hadoop Distributed File System.
    Senthilkumar, M.
    Ilango, P.
    [J]. RESEARCH JOURNAL OF PHARMACEUTICAL BIOLOGICAL AND CHEMICAL SCIENCES, 2016, 7 (03): : 796 - 803
  • [6] An enhancement of data locality in Hadoop distributed file system
    Reddy, A. Siva Krishna
    Sujatha, Pothula
    Koti, Prasad
    Dhavachelvan, P.
    Amudhavel, J.
    [J]. BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2018, 11 (01): : 123 - 133
  • [7] The Hadoop Distributed File System
    Shvachko, Konstantin
    Kuang, Hairong
    Radia, Sanjay
    Chansler, Robert
    [J]. 2010 IEEE 26TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2010,
  • [8] Performance Analysis of Hadoop Distributed File System Writing File Process
    Xie, Yunyue
    Farhan, Abobaker Mohammed Qasem
    Zhou, Meihua
    [J]. 2018 INTERNATIONAL CONFERENCE ON INTELLIGENT AUTONOMOUS SYSTEMS (ICOIAS), 2018, : 116 - 120
  • [9] Data Adaptively Storing Approach for Hadoop Distributed File System
    Fu, Yingxun
    Wen, Shilin
    Ma, Li
    [J]. 2017 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (ICCIA), 2017, : 20 - 24
  • [10] A CKAN Plugin for Data Harvesting to the Hadoop Distributed File System
    Scholz, Robert
    Tcholtchev, Nikolay
    Laemmel, Philipp
    Schieferdecker, Ina
    [J]. CLOSER: PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2017, : 19 - 28