Optimizing data placement in heterogeneous Hadoop clusters

被引:0
|
作者
Runqun Xiong
Junzhou Luo
Fang Dong
机构
[1] Southeast University,School of Computer Science and Engineering
来源
Cluster Computing | 2015年 / 18卷
关键词
Hadoop cluster; HDFS; Data placement; Heterogeneous; Replica;
D O I
暂无
中图分类号
学科分类号
摘要
Data placement decision of Hadoop distributed file system (HDFS) is very important for the data locality which is a primary criterion for task scheduling of MapReduce model and eventually affects the application performance. The existing HDFS’s rack-aware data placement strategy and replication scheme are work well with MapReduce framework in homogeneous Hadoop clusters, but in practice, such data placement policy can noticeably reduce MapReduce performance and may cause increasingly energy dissipation in heterogeneous environments. Besides that, HDFS employs an inflexible replica factor acquiescently for each data block, which will give rise to unnecessary waste of storage space when there is a lot of inactive data in Hadoop system. In this paper, we propose a novel data placement strategy (SLDP) for heterogeneous Hadoop clusters. SLDP adopts a heterogeneity aware algorithm to divide various nodes into several virtual storage tiers (VSTs) firstly, and then places data blocks across nodes in each VST circuitously according to the hotness of data. Furthermore, SLDP uses a hotness proportional replication to save disk space and also has an effective power control function. Experimental results on two real data-intensive applications show that SLDP is energy-efficient, space-saving and able to improve MapReduce performance in a heterogeneous Hadoop cluster significantly.
引用
收藏
页码:1465 / 1480
页数:15
相关论文
共 50 条
  • [41] Effects of VM Placement Constraints in Heterogeneous Virtual Clusters
    Kim, Seontae
    Choi, Young-ri
    2018 IEEE 3RD INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W), 2018, : 30 - 36
  • [42] Optimizing process allocation of parallel programs for heterogeneous clusters
    Ichikawa, Shuichi
    Takahashi, Sho
    Kawai, Yuu
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2009, 21 (04): : 475 - 507
  • [43] Enhanced Bond Energy Algorithm for Data Placement in Hadoop Framework
    Sridevi, S.
    Reshma, J. G.
    Pavithradevi, E.
    Dhivya, S.
    Uthariaraj, V. Rhymend
    2018 10TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2018, : 208 - 215
  • [44] SDWP: A New Data Placement Strategy for Distributed Big Data Warehouses in Hadoop
    Ramdane, Yassine
    Kabachi, Nadia
    Boussaid, Omar
    Bentayeb, Fadila
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2019, 2019, 11708 : 189 - 205
  • [45] Task Allocation Scheme Based on Computational and Network Resources for Heterogeneous Hadoop Clusters
    Matsuno, Tomohiro
    Chatterjee, Bijoy Chand
    Oki, Eiji
    Veeraraghavan, Malathi
    Okamoto, Satoru
    Yamanaka, Naoaki
    2016 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE SWITCHING AND ROUTING (HPSR), 2016, : 200 - 205
  • [46] A sliding window-based dynamic load balancing for heterogeneous Hadoop clusters
    Liu, Yang
    Jing, Weizhe
    Liu, Youbo
    Lv, Lin
    Qi, Man
    Xiang, Yang
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (03):
  • [47] Optimizing Caching Placement for Mobile Users in Heterogeneous Wireless Network
    Zhan, Cheng
    Yao, Guo
    2017 IEEE 42ND CONFERENCE ON LOCAL COMPUTER NETWORKS (LCN), 2017, : 175 - 178
  • [48] Optimizing Tier-Level Content Placement in Heterogeneous Networks
    Wen, Juan
    Huang, Kaibin
    Yang, Sheng
    Li, Victor O. K.
    GLOBECOM 2017 - 2017 IEEE GLOBAL COMMUNICATIONS CONFERENCE, 2017,
  • [49] Frequent Item set Using Abundant Data on Hadoop Clusters in Big Data
    Danapaquiame, N.
    Balaji, V.
    Gayathri, R.
    Kodhai, E.
    Sambasivam, G.
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2018, 11 (01): : 104 - 112
  • [50] Autoscaling for Hadoop Clusters
    Gandhi, Anshul
    Thota, Sidhartha
    Dube, Parijat
    Kochut, Andrzej
    Zhang, Li
    PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E), 2016, : 109 - 118