Optimizing data placement in heterogeneous Hadoop clusters

被引:0
|
作者
Runqun Xiong
Junzhou Luo
Fang Dong
机构
[1] Southeast University,School of Computer Science and Engineering
来源
Cluster Computing | 2015年 / 18卷
关键词
Hadoop cluster; HDFS; Data placement; Heterogeneous; Replica;
D O I
暂无
中图分类号
学科分类号
摘要
Data placement decision of Hadoop distributed file system (HDFS) is very important for the data locality which is a primary criterion for task scheduling of MapReduce model and eventually affects the application performance. The existing HDFS’s rack-aware data placement strategy and replication scheme are work well with MapReduce framework in homogeneous Hadoop clusters, but in practice, such data placement policy can noticeably reduce MapReduce performance and may cause increasingly energy dissipation in heterogeneous environments. Besides that, HDFS employs an inflexible replica factor acquiescently for each data block, which will give rise to unnecessary waste of storage space when there is a lot of inactive data in Hadoop system. In this paper, we propose a novel data placement strategy (SLDP) for heterogeneous Hadoop clusters. SLDP adopts a heterogeneity aware algorithm to divide various nodes into several virtual storage tiers (VSTs) firstly, and then places data blocks across nodes in each VST circuitously according to the hotness of data. Furthermore, SLDP uses a hotness proportional replication to save disk space and also has an effective power control function. Experimental results on two real data-intensive applications show that SLDP is energy-efficient, space-saving and able to improve MapReduce performance in a heterogeneous Hadoop cluster significantly.
引用
收藏
页码:1465 / 1480
页数:15
相关论文
共 50 条
  • [31] An Optimization Algorithm for Heterogeneous Hadoop Clusters Based on Dynamic Load Balancing
    Yan, Wei
    Li, ChunLin
    Du, ShuMeng
    Mao, Xijun
    2016 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 2016, : 250 - 255
  • [32] Towards Energy Efficiency in Heterogeneous Hadoop Clusters by Adaptive Task Assignment
    Cheng, Dazhao
    Lama, Palden
    Jiang, Changjun
    Zhou, Xiaobo
    2015 IEEE 35TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, 2015, : 359 - 368
  • [33] Improving Hadoop MapReduce performance on heterogeneous single board computer clusters☆
    Lim, Sooyoung
    Park, Dongchul
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 160 : 752 - 766
  • [34] Stargate: Remote Data Access between Hadoop Clusters
    Choi, Illyoung
    Hartman, John H.
    36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 32 - 39
  • [35] GPU Computations on Hadoop Clusters for Massive Data Processing
    Chen, Wenbo
    Xu, Shungou
    Jiang, Hai
    Weng, Tien-Hsiung
    Marino, Mario Donato
    Chen, Yi-Siang
    Li, Kuan-Ching
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT TECHNOLOGIES AND ENGINEERING SYSTEMS (ICITES2014), 2016, 345 : 515 - 521
  • [36] Efficient Scheme for Compressing and Transferring Data in Hadoop Clusters
    Lee, Seungyeon
    Lee, Jusuk
    Kim, Yongmin
    Park, Kicheol
    Hong, Jiman
    Heo, Junyoung
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 1256 - 1263
  • [37] Optimizing Cache Placement for Heterogeneous Small Cell Networks
    Liao, Jialing
    Wong, Kai-Kit
    Khandaker, Muhammad R. A.
    Zheng, Zhongbin
    IEEE COMMUNICATIONS LETTERS, 2017, 21 (01) : 120 - 123
  • [38] SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters
    Gu, Rong
    Yang, Xiaoliang
    Yan, Jinshuang
    Sun, Yuanhao
    Wang, Bing
    Yuan, Chunfeng
    Huang, Yihua
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2014, 74 (03) : 2166 - 2179
  • [39] RDP: A storage-tier-aware Robust Data Placement strategy for Hadoop in a Cloud-based Heterogeneous Environment
    Qureshi, Nawab Muhammad Faseeh
    Shin, Dong Ryeol
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2016, 10 (09): : 4063 - 4086
  • [40] Optimizing Hadoop Performance for Big Data Analytics in Smart Grid
    Khan, Mukhtaj
    Huang, Zhengwen
    Li, Maozhen
    Taylor, Gareth A.
    Ashton, Phillip M.
    Khan, Mushtaq
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2017, 2017