An Empirical Performance Analysis on Hadoop via Optimizing the Network Heartbeat Period

被引:3
|
作者
Lee, Jaehwan [1 ]
Choi, June [1 ]
Roh, Hongchan [2 ]
Shin, Ji Sun [3 ]
机构
[1] Korea Aerosp Univ, Sch Elect & Informat, Goyang, South Korea
[2] SK Telecom, Seoul, South Korea
[3] Sejong Univ, Dept Comp & Informat Secur, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Hadoop; Heartbeat; Hadoop Ecosystem; Hive; TPC-H; Terasort Benchmark;
D O I
10.3837/tiis.2018.11.005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To support a large-scale Hadoop cluster, Hadoop heartbeat messages are designed to deliver the significant messages, including task scheduling and completion messages, via piggybacking to reduce the number of messages received by the NameNode. Although Hadoop is designed and optimized for high-throughput computing via batch processing, the real-time processing of large amounts of data in Hadoop is increasingly important. This paper evaluates Hadoop's performance and costs when the heartbeat period is controlled to support latency sensitive applications. Through an empirical study based on Hadoop 2.0 (YARN) [1] architecture, we improve Hadoop's I/O performance as well as application performance by up to 13 percent compared to the default configuration. We offer a guideline that predicts the performance, costs and limitations of the total system by controlling the heartbeat period using simple equations. We show that Hive performance can be improved by tuning Hadoop's heartbeat periods through extensive experiments.
引用
收藏
页码:5252 / 5268
页数:17
相关论文
共 50 条
  • [1] Analyzing & Optimizing Hadoop Performance
    Jain, Ankita
    Choudhary, Monika
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS AND COMPUTATIONAL INTELLIGENCE (ICBDAC), 2017, : 116 - 121
  • [2] Optimizing Performance of Hadoop with Parameter Tuning
    Chen, Xiang
    Liang, Yi
    Li, Guang-Rui
    Chen, Cheng
    Liu, Si-Yu
    4TH ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS (ITA 2017), 2017, 12
  • [3] Network analysis of human heartbeat dynamics
    Shao, Zhi-Gang
    APPLIED PHYSICS LETTERS, 2010, 96 (07)
  • [4] Optimizing Hadoop Performance for Big Data Analytics in Smart Grid
    Khan, Mukhtaj
    Huang, Zhengwen
    Li, Maozhen
    Taylor, Gareth A.
    Ashton, Phillip M.
    Khan, Mushtaq
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2017, 2017
  • [5] Doopnet: An Emulator for Network Performance Analysis of Hadoop Clusters Using Docker and Mininet
    Qiao, Yuansong
    Wang, Xueyuan
    Fang, Guiming
    Lee, Brian
    2016 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATION (ISCC), 2016, : 784 - 790
  • [6] Analysis of Network IO Performance in Hadoop Cluster Environments Based on Docker Containers
    Varma, P. China Venkanna
    Chakravarthy, K. V. Kalyan
    Kumari, V. Valli
    Raju, S. Viswanadha
    PROCEEDINGS OF FIFTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2015), VOL 2, 2016, 437 : 227 - 237
  • [7] Network Traffic Analysis Based on Hadoop
    Yang, Jie
    He, Haiyang
    Qiao, Yuanyuan
    2014 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, VEHICULAR TECHNOLOGY, INFORMATION THEORY AND AEROSPACE & ELECTRONIC SYSTEMS (VITAE), 2014,
  • [8] Home Area Network for Optimizing Telehealth Services- Empirical Simulation Analysis
    Shah, Mohib A.
    Kim, Jinman
    Khadra, Mohamed H.
    Feng, Dagan
    2014 36TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2014, : 1370 - 1373
  • [9] Optimizing the Hadoop MapReduce Framework with high-performance storage devices
    Moon, Sangwhan
    Lee, Jaehwan
    Sun, Xiling
    Kee, Yang-suk
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (09): : 3525 - 3548
  • [10] Optimizing the Hadoop MapReduce Framework with high-performance storage devices
    Sangwhan Moon
    Jaehwan Lee
    Xiling Sun
    Yang-suk Kee
    The Journal of Supercomputing, 2015, 71 : 3525 - 3548