Adaptive Replication Management in HDFS Based on Supervised Learning

被引:27
|
作者
Bui, Dinh-Mao [1 ]
Hussain, Shujaat [1 ]
Huh, Eui-Nam [1 ]
Lee, Sungyoung [1 ]
机构
[1] Kyung Hee Univ, Dept Comp Engn, Suwon 446701, South Korea
基金
新加坡国家研究基金会;
关键词
Replication; HDFS; proactive prediction; optimization; Bayesian learning; Gaussian process; ERASURE CODES;
D O I
10.1109/TKDE.2016.2523510
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The number of applications based on Apache Hadoop is dramatically increasing due to the robustness and dynamic features of this system. At the heart of Apache Hadoop, the Hadoop Distributed File System (HDFS) provides the reliability and high availability for computation by applying a static replication by default. However, because of the characteristics of parallel operations on the application layer, the access rate for each data file in HDFS is completely different. Consequently, maintaining the same replication mechanism for every data file leads to detrimental effects on the performance. By rigorously considering the drawbacks of the HDFS replication, this paper proposes an approach to dynamically replicate the data file based on the predictive analysis. With the help of probability theory, the utilization of each data file can be predicted to create a corresponding replication strategy. Eventually, the popular files can be subsequently replicated according to their own access potentials. For the remaining low potential files, an erasure code is applied to maintain the reliability. Hence, our approach simultaneously improves the availability while keeping the reliability in comparison to the default scheme. Furthermore, the complexity reduction is applied to enhance the effectiveness of the prediction when dealing with Big Data.
引用
收藏
页码:1369 / 1382
页数:14
相关论文
共 50 条
  • [1] Supervised Learning based HDFS Replication Management System
    Ilakiyaa, R.
    Nalini, N. J.
    [J]. 2017 INTERNATIONAL CONFERENCE ON TECHNICAL ADVANCEMENTS IN COMPUTERS AND COMMUNICATIONS (ICTACC), 2017, : 116 - 120
  • [2] Replication Management Framework for HDFS based on Prediction Technique
    Bui, Dinh-Mao
    Thien Huynh-The
    Lee, Sungyoung
    Li, Bin
    Wang, Jin
    [J]. 2015 THIRD INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA, 2015, : 58 - 63
  • [3] An efficient replication management system for HDFS management
    Swaroopa K.
    Satya Phani Kumari A.
    Manne N.
    Satpathy R.
    Pavan Kumar T.
    [J]. Materials Today: Proceedings, 2023, 80 : 2799 - 2802
  • [4] Dynamic Replication Policy on HDFS Based on Machine Learning Clustering
    Ahmed, Motaz A.
    Khafagy, Mohamed H.
    Shaheen, Masoud E.
    Kaseb, Mostafa R.
    [J]. IEEE ACCESS, 2023, 11 : 18551 - 18559
  • [5] Multicast-based Replication for Hadoop HDFS
    Wu, Jiadong
    Hong, Bo
    [J]. 2015 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2015, : 143 - 148
  • [6] Towards Adaptive Replication for Hot/Cold Blocks in HDFS using MemCached
    Liu, Pinchao
    Maruf, Adnan
    Yusuf, Farzana Beente
    Jahan, Labiba
    Xu, Hailu
    Guan, Boyuan
    Hu, Liting
    Iyengar, Sitharama S.
    [J]. 2019 2ND INTERNATIONAL CONFERENCE ON DATA INTELLIGENCE AND SECURITY (ICDIS 2019), 2019, : 188 - 194
  • [7] Placement Scheduling for Replication in HDFS Based on Probabilistic Approach
    Bui, Dinh-Mao
    Lee, Sungyoung
    [J]. INCLUSIVE SMART CITIES AND DIGITAL HEALTH, 2016, 9677 : 314 - 320
  • [8] A Metadata Management Mechanism Based on HDFS
    Chen, Xiaofeng
    Lou, Yuansheng
    Hu, Dongmei
    [J]. Applied Decisions in Area of Mechanical Engineering and Industrial Manufacturing, 2014, 577 : 1026 - 1029
  • [9] A supervised learning network based on adaptive resonance theory
    Zhou, J
    Bennett, S
    [J]. INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 1997, 8 (02) : 239 - 246
  • [10] Classification based Metadata Management for HDFS
    Chandrasekar, Ashok
    Chandrasekar, Karthik
    Ramasatagopan, Harini
    Rafica, A. R.
    Balasubramaniyan, Jagadeesh
    [J]. 2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 1021 - 1026