Optimizing data placement in heterogeneous Hadoop clusters

被引:0
|
作者
Runqun Xiong
Junzhou Luo
Fang Dong
机构
[1] Southeast University,School of Computer Science and Engineering
来源
Cluster Computing | 2015年 / 18卷
关键词
Hadoop cluster; HDFS; Data placement; Heterogeneous; Replica;
D O I
暂无
中图分类号
学科分类号
摘要
Data placement decision of Hadoop distributed file system (HDFS) is very important for the data locality which is a primary criterion for task scheduling of MapReduce model and eventually affects the application performance. The existing HDFS’s rack-aware data placement strategy and replication scheme are work well with MapReduce framework in homogeneous Hadoop clusters, but in practice, such data placement policy can noticeably reduce MapReduce performance and may cause increasingly energy dissipation in heterogeneous environments. Besides that, HDFS employs an inflexible replica factor acquiescently for each data block, which will give rise to unnecessary waste of storage space when there is a lot of inactive data in Hadoop system. In this paper, we propose a novel data placement strategy (SLDP) for heterogeneous Hadoop clusters. SLDP adopts a heterogeneity aware algorithm to divide various nodes into several virtual storage tiers (VSTs) firstly, and then places data blocks across nodes in each VST circuitously according to the hotness of data. Furthermore, SLDP uses a hotness proportional replication to save disk space and also has an effective power control function. Experimental results on two real data-intensive applications show that SLDP is energy-efficient, space-saving and able to improve MapReduce performance in a heterogeneous Hadoop cluster significantly.
引用
收藏
页码:1465 / 1480
页数:15
相关论文
共 50 条
  • [21] Optimizing OLAP cubes construction by improving data placement on multi-nodes clusters
    Arres, Billel
    Kabachi, Nadia
    Boussaid, Omar
    23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 520 - 524
  • [22] Optimizing Read Operations of Hadoop Distributed File System on Heterogeneous Storages
    Lee, Jongbaeg
    Lee, Jongwuk
    Lee, Sang-Won
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2021, 37 (03) : 709 - 729
  • [23] Tarazu: Optimizing Map Reduce On Heterogeneous Clusters
    Ahmad, Faraz
    Chakradhar, Srimat
    Raghunathan, Anand
    Vijaykumar, T. N.
    ASPLOS XVII: SEVENTEENTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2012, : 61 - 74
  • [24] CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop
    Eltabakh, Mohamed Y.
    Tian, Yuanyuan
    Ozcan, Fatma
    Gemulla, Rainer
    Krettek, Aljoscha
    McPherson, John
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 4 (09): : 575 - 585
  • [25] Bandwidth-Aware Data Placement Scheme for Hadoop
    Shabeera, T. P.
    Kumar, Madhu S. D.
    2013 IEEE RECENT ADVANCES IN INTELLIGENT COMPUTATIONAL SYSTEMS (RAICS), 2013, : 64 - 67
  • [26] IDP: An Innovative Data Placement Algorithm for Hadoop Systems
    Lee, Chia-Wei
    Huang, Horng-Chyau
    Hsieh, Sun-Yuan
    INTELLIGENT SYSTEMS AND APPLICATIONS (ICS 2014), 2015, 274 : 49 - 58
  • [27] Hadoop as a Service: Integration of a Company's Heterogeneous Data to a Remote Hadoop Infrastructure
    Kalmukov, Yordan
    Marinov, Milko
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (04) : 49 - 55
  • [28] Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters
    Yao, Yi
    Wang, Jiayin
    Sheng, Bo
    Tan, Chiu C.
    Mi, Ningfang
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2017, 5 (02) : 344 - 357
  • [29] Hadoop plus : Modeling and Evaluating the Heterogeneity for MapReduce Applications in Heterogeneous Clusters
    He, Wenting
    Cui, Huimin
    Lu, Binbin
    Zhao, Jiacheng
    Li, Shengmei
    Ruan, Gong
    Xue, Jingling
    Feng, Xiaobing
    Yang, Wensen
    Yan, Youliang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 143 - 153
  • [30] Energy Efficiency Aware Task Assignment with DVFS in Heterogeneous Hadoop Clusters
    Cheng, Dazhao
    Zhou, Xiaobo
    Lama, Palden
    Ji, Mike
    Jiang, Changjun
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (01) : 70 - 82