IDP: An Innovative Data Placement Algorithm for Hadoop Systems

被引:2
|
作者
Lee, Chia-Wei [1 ]
Huang, Horng-Chyau [1 ]
Hsieh, Sun-Yuan [1 ,2 ,3 ]
机构
[1] Natl Cheng Kung Univ, Inst Med Informat, 1 Univ Rd, Tainan 701, Taiwan
[2] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan 701, Taiwan
[3] Natl Cheng Kung Univ, Inst Mfg Informat & Syst, Tainan 701, Taiwan
关键词
Data Placement; Hadoop; Heterogeneous; MapReduce;
D O I
10.3233/978-1-61499-484-8-49
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a data placement strategy to deal with the imbalanced workload problem on DataNodes. Basing on computing capability of each node in a heterogeneous Hadoop cluster, the proposed strategy can balance the data that was stored in the DataNode such that the cost of data transfer time can be tremendously reduced. As a result, the Hadoop overall performance can be greatly improved. Experimental results demonstrate that the proposed data placement strategy can highly decrease the execution time and thus improves Hadoop performance in a heterogeneous cluster.
引用
收藏
页码:49 / 58
页数:10
相关论文
共 50 条
  • [31] A Data Locality Optimization Algorithm for Large-scale Data Processing in Hadoop
    Zhao, Yanrong
    Wang, Weiping
    Meng, Dan
    Yang, Xiufeng
    Zhang, Shubin
    Li, Jun
    Guan, Gang
    2012 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2012, : 655 - 661
  • [32] A Discretization Algorithm for Meteorological Data and its Parallelization Based on Hadoop
    Liu, Chao
    Jin, Wen
    Yu, Yuting
    Qiu, Taorong
    Bai, Xiaoming
    Zou, Shuilong
    2017 INTERNATIONAL CONFERENCE ON CLOUD TECHNOLOGY AND COMMUNICATION ENGINEERING (CTCE2017), 2017, 910
  • [33] A parallel clustering algorithm for Logs Data Based on Hadoop Platform
    Huo, Jiuyuan
    Weng, Jian
    Qu, Hong
    2019 THE 3RD INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPILATION, COMPUTING AND COMMUNICATIONS (HP3C 2019), 2019, : 90 - 94
  • [34] Massive data MapReduce fingerprint discriminant algorithm Based on Hadoop
    Lu, Wei
    Huang, Jun
    Hong, Lin
    INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY, PTS 1-4, 2013, 263-266 : 2655 - +
  • [35] Parallel Implementation of PrePost Algorithm Based on Hadoop for Big Data
    Rochd, Yassir
    Hafidi, Imad
    2018 IEEE 5TH INTERNATIONAL CONGRESS ON INFORMATION SCIENCE AND TECHNOLOGY (IEEE CIST'18), 2018, : 24 - 28
  • [36] A Data Placement Algorithm for Data Intensive Applications in Cloud
    Zhao, Qing
    Xiong, Congcong
    Zhang, Kunyu
    Yue, Yang
    Yang, Jucheng
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (02): : 145 - 155
  • [37] RSEDP: an effective hybrid data placement algorithm for large-scale storage systems
    Nong Xiao
    Tao Chen
    Fang Liu
    The Journal of Supercomputing, 2011, 55 : 103 - 122
  • [38] Optimal Approximation Algorithm of Virtual Machine Placement for Data Latency Minimization in Cloud Systems
    Kuo, Jian-Jhih
    Yang, Hsiu-Hsien
    Tsai, Ming-Jer
    2014 PROCEEDINGS IEEE INFOCOM, 2014, : 1303 - 1311
  • [39] RSEDP: an effective hybrid data placement algorithm for large-scale storage systems
    Xiao, Nong
    Chen, Tao
    Liu, Fang
    JOURNAL OF SUPERCOMPUTING, 2011, 55 (01): : 103 - 122
  • [40] Research on an Innovative Algorithm for Optimizing Intelligent Image Data Systems Based on Deep Learning
    Dong, Xiaonan
    Song, Yingbin
    2024 IEEE 7TH INTERNATIONAL CONFERENCE ON AUTOMATION, ELECTRONICS AND ELECTRICAL ENGINEERING, AUTEEE, 2024, : 381 - 386